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Abstract 

^  One  of  the  primary  assumptions  used  in  the  relational  model  is  that  all  relations  must 
be  in  first  normal  form;  that  is,  all  values  must  be  non-decomposable  units.  This  as¬ 
sumption  unduly  constrains  our  ability  to  model  data,  especially  for  the  non-traditional 
applications  which  are  taxing  our  current  database  systems.  This  research  extends  rela¬ 
tional  database  theory  by  relaxing  the  assumption  that  aH  relations  in  the  database  must 
be  in  first  normal  form.  Relations  containing  attributes  which  may  be  atomic-valued  or 
relation-valued  are  said  to  be  in  non-first  normal  form  (non-INF).  In  this  context,  we  de¬ 
velop  a  non-INF  model  and  an  extended  formal  query  language  based  on  the  relational 
calculus,  and  prove  its  equivalence  to  a  relational  algebra  extended  with  nest  and  unnest 
operators  to  deal  with  non-INF  relations.  We  define  a  property  which  non-INF  relations 
should  satisfy,  called  partitioned  normal  form  (PNF),  and  develop  a  set  of  extended  al¬ 
gebra  operators  to  manipulate  non-INF  relations  and  maintain  the  PNF  property.  Our 
model  and  the  extended  operators  are  then  further  extended  to  deal  with  null  values 
and  empty  nested  relations.  We  present  a  user-oriented  non-INF  query  language,  called 
SQL/NF,  which  is  based  on  the  SQL  commercial  database  language  and  a  proposed  re¬ 
lational  database  language  standard.  Finally,  we  present  a  method  for  achieving  nested 
normal  form,  a  form  which  eliminates  anomalies  due  to  partial  and  transitive  dependen¬ 
cies  in  PNF  relations,  and  differs  from  previous  algorithms  by  building  non-INF  relations 
from  an  initial  fourth  normal  form  decomposition,  incorporating  embedded  multivalued 
dependencies  into  the  design,  and  improving  upon  the  use  of  functional  dependencies. 
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Abstract 


The  advent  of  sophisticated  software  tools  running  on  low-cost,  powerful  com¬ 
puters  has  prodded  the  database  community  into  moving  beyond  the  tradi¬ 
tional  data  processing  applications  for  which  database  systems  were  originally 
designed.  Office  forms,  computer-aided  design,  and  statistical  database  systems 
are  but  a  few  of  the  new  applications  for  database  systems  which  require  new 
approaches  to  the  database  design  and  implementation.  The  foremost  model 
for  database  use  in  the  last  decade  has  been  the  relational  model.  One  of  the 
primary  assumptions  used  in  the  relational  model  is  that  all  relations  must  be 
in  first  normal  form;  that  is,  all  values  must  be  non-decomposable  units.  This 
assumption  unduly  constrains  our  ability  to  model  data,  especially  for  the  non- 
traditional  applications  which  are  taxing  our  current  database  systems.  This 
research  extends  relational  database  theory  by  relaxing  the  assumption  that  all 
relations  in  the  database  must  be  in  first  normal  form.  Relations  containing 
attributes  which  may  be  atomic-valued  or  relation-valued  are  said  to  be  in  non- 
first  normal  form  (non- INF).  In  this  context,  we  develop  a  non- INF  model  and 
an  extended  formal  query  language  based  on  the  relational  calculus,  and  prove 
its  equivalence  to  a  relational  algebra  extended  with  nest  and  unnest  operators 
to  deal  with  non-INF  relations.  We  define  a  property  which  non-INF  rela¬ 
tions  should  satisfy,  called  partitioned  normal  form  (PNF),  and  develop  a  set 
of  extended  algebra  operators  to  manipulate  non-INF  relations  and  maintain 

v 


the  PNF  property.  Our  model  and  the  extended  operators  are  then  further 
extended  to  deal  with  null  values  and  empty  nested  relations.  We  present  a 
user-oriented  non-INF  query  language,  called  SQL/NF,  which  is  based  on  the 
SQL  commercial  database  language  and  a  proposed  relational  database  lan¬ 
guage  standard.  Finally,  we  present  a  method  for  achieving  nested  normal  form, 
a  form  which  eliminates  anomalies  due  to  partial  and  transitive  dependencies 
in  PNF  relations,  and  differs  from  previous  algorithms  by  building  non-INF 
relations  from  an  initial  fourth  normal  form  decomposition,  incorporating  em¬ 
bedded  multivalued  dependencies  into  the  design,  and  improving  upon  the  use 
of  functional  dependencies. 
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Chapter  1 
Introduction 


1.1  Motivation 

There  has  been  a  flurry  of  activity  in  recent  years  in  the  development  of  data¬ 
bases  to  support  “high-level”  data  structures  and  complex  objects.  Office  forms 
[SLTC],  computer-aided  design  [BaKi],  information  retrieval  systems  [Schl, 
SP],  and  statistical  databases  [002]  are  a  few  examples  of  non-traditional  appli¬ 
cations  that  require  specialized  database  support.  One  of  the  stumbling  blocks 
in  using  traditional  relational  databases  is  the  assumption  that  all  relations  are 
required  to  be  in  first  normal  form  (INF);  that  is,  all  values  in  the  database 
are  non-decomposable.  The  INF  assumption  was  a  valid  one  for  many  years 
since  database  systems  were  geared  to  traditional  business  data  processing  tasks 
such  as  payroll  and  inventory.  First  normal  form  allowed  a  close  mapping  from 
physical  data  files  to  relations  and  simplified  the  theoretical  and  implementa¬ 
tion  problems  for  a  model  that  was  inefficient  and  slow  to  be  accepted  in  early 
systems.  Using  today’s  computer  technology,  these  reasons  for  relying  on  INF 
are  not  as  valid  as  they  once  were.  We  now  have  the  resources  to  deal  with 
the  type  of  applications  mentioned  above  without  constraining  our  databases 
by  insisting  on  adherance  to  the  INF  restriction. 

For  this  reason,  non-first  normal  form  (->1NF)  relations  were  proposed 
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in  which  the  attributes  of  the  relation  can  take  on  values  which  are  sets  or  even 
relations  themselves.  Complex  objects  and  semantically  connected  groups  of 
data  can  more  easily  be  represented  as  sets  of  values.  For  example,  an  object 
identifier  is  related  to  a  set  of  object  characteristics.  This  is  more  natural  then 
forcing  the  view  that  we  have  many  object  identifier,  object  characteristic  pairs 
as  we  would  have  to  when  sets  are  not  available.  This  new  assumption  about 
sets  in  relations  created  a  need  to  reexamine  the  fundamentals  of  relational 
database  theory  and  opened  the  door  for  the  introduction  of  new  relational 
operators  which  take  advantage  of  the  nested  structure  of  ->1NF  relations. 

To  illustrate  this,  consider  an  employee  relation  in  INF  (Figure  1-la), 
and  a  possible  -’INF  structuring  of  it  (Figure  1-lb).  The  -'INF  relation  has 
two  tuples, 

(Smith,  {(Sam),  (Sue)},  {(typing),  (filing)}) 

and 

(Jones,  {(Joe),  (Mike)},  {(typing),  (dictation),  (data  entry)}). 

The  Children  and  Skills  attributes  are  nested  relations  each  with  one  attribute, 
child  and  skill,  respectively.  The  ->1NF  relation  makes  clearer  the  independent 
associations  of  employee  and  skill,  and  employee  and  child,  and  reduces  the 
data  redundancy  when  compared  with  an  equivalent  INF  relation. 

An  additional  advantage  of  using  a  ->1NF  structuring  of  the  data¬ 
base  is  that  fewer  relations  are  needed  to  design  a  database  that  minimizes 
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Figure  1-1.  Employee  relation  in  (a)  INF  and  (b)  ->1NF. 


redundancy,  the  cause  of  several  undesirable  properties.  To  illustrate  this,  con¬ 
sider  that  in  the  employee  example  of  Figure  1-la,  we  would  expect  that  the 
employee-child  relationship  is  independent  of  the  employee-skill  relationship. 
Thus,  if  we  were  to  add  a  new  child  for  employee  “Jones,”  we  would  have 
to  add  three  tuples,  one  for  each  skill  that  Jones  currently  has.  Also,  if  we 
needed  to  change  Smith’s  “typing”  skill  to  “word  processing”  we  would  need 
to  update  a  tuple  for  every  child  of  Smith.  In  order  to  reduce  redundancy  and 
avoid  update  anomalies  we  would  decompose  this  relation  into  its  projections 
(employee,  child)  and  (employee,  skill).  This  result  is  shown  in  Figure  1-2.  To 

view  the  entire  database  we  must  use  a  join  operator  to  reassemble  the  original 

I 

relation.  The  ->1NF  relation  of  Figure  1-lb  is  free  of  the  problems  associated 
with  the  INF  relation  and  uses  one  relation  rather  than  the  two  required  in  the 
proposed  decomposition.  Furthermore,  queries  involving  child  and  skill  will  be 
simpler  and  more  efficient  since  a  join  of  decomposed  relations  is  not  necessary. 


t 

I 

t 
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K  TiiJ  SBE ' 
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Figure  1-2.  Decomposition  of  employee  relation  of  Figure  1-la. 

1.2  The  State  of  ->1NF  Research 

A  significant  amount  of  research  in  the  area  of  -ilNF  relations  has  been  done 
since  the  idea  was  first  proposed  in  [Mak].  Most  of  this  work  has  been  published 
in  the  last  four  years  and  has  been  concentrated  in  the  following  four  areas: 

(1)  Development  of  data  models  to  handle  the  new  structure  of  ->1NF 
relations. 

(2)  Development  of  a  relational  algebra  for  ->1NF  relations  and  the  various 
properties  of  such  an  algebra. 

(3)  Exploration  of  new  data  dependencies  which  characterize  ->1NF  rela¬ 
tions. 

(4)  Introduction  of  a  new  normal  form  for  nested  relations  with  goals 
similar  to  traditional  normalization  techniques. 

In  addition,  the  first  three  of  these  areas  developed  in  approximately  three 
stages.  At  first,  nested  relations  were  limited  to  single  attributes  and  only 
one  level  of  nesting  was  allowed.  This  is  the  type  of  relation  shown  in  Figure 
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1-lb.  Then,  the  theory  was  generalized  to  many  attributes  and  single  levels, 
and  finally  to  many  attributes  and  multi-level  nesting.  The  particulars  of  this 
previous  work  are  summarized,  for  the  most  part,  in  Chapter  3,  following  an 
introduction  to  the  INF  relational  model  in  Chapter  2.  Where  appropriate, 
we  group  related  previous  work  with  the  chapter  describing  that  area  of  new 
results. 


Notably  missing  from  this  body  of  work  was  the  development  of  a 
relational  calculus  for  ->1NF  relations.  A  calculus  is  a  more  formal  language 
than  an  algebra  and  more  clearly  delineates  the  expressive  power  of  the  data¬ 
base  language  for  a  particular  model.  Our  first  task  was  to  develop  a  -’INF 
relational  calculus  and  prove  its  equivalence  to  the  -’INF  relational  algebra. 
This  work  was  reported  in  [RKSl]  and  is  presented  here  in  Chapters  4-6. 

Another  significant  area  where  results  had  not  been  developed  at  all 
was  in  the  addition  of  null  values  to  the  -'INF  model.  The  traditional  null 
values  could  still  be  allowed  for  single  valued  attributes,  but  there  was  no 
attempt  to  deal  adequately  with  the  problem  of  null  values  for  nested  relations. 
Empty  nested  relations  are  a  kind  of  null  value  and  cause  problems  in  query 
languages,  if  not  carefully  defined.  An  overview  of  research  on  null  values  and 
our  extensions  of  the  ->1NF  model  to  include  null  values  and  empty  nested 
relations  were  reported  in  [RKS2j  and  are  presented  here  in  Chapter  7. 


We  have  also  explored  several  areas  related  to  database  languages 
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and  normalization.  These  include  extended  relational  algebra  operators,  with 
and  without  null  values  in  the  model,  a  user-oriented  language  based  on  SQL 
(also  described  in  [RKB]),  and  a  new  method  for  achieving  normalized  ->1NF 
relations.  These  areas  are  summarized  in  the  next  section. 

1.3  Overview 

In  this  section  we  present  an  overview  of  our  contribution  to  the  area  of  -ilNF 
relational  database  research.  A  major  portion  of  this  research  is  concerned 
with  the  development  of  query  languages,  both  formal  and  user-oriented,  for 
->1NF  databases.  We  define  an  extended  relational  calculus  as  the  theoretical 
basis  for  our  ->1NF  database  query  language.  We  define  an  extended  relational 
algebra  and  prove  its  equivalence  to  the  extended  calculus.  In  addition  to 
the  standard  algebra  operators,  the  extended  algebra  includes  new  operators, 
nest  and  unnest ,  first  described  by  Jaeschke  and  Schek  [JS],  for  manipulating 
nested  relations.  The  nest  operator  forms  nested  relations  by  partitioning  a 
relation  based  on  the  values  of  some  attributes  and  collecting  all  tuples  on  the 
attributes  being  nested  into  a  nested  relation  for  each  partition.  The  unnest 
operator  eliminates  a  nested  relation,  concatenating  each  tuple  of  the  nested 
relation  with  the  other  attributes  of  the  relation. 

Formal  languages  are  useful  for  defining  the  retrieval  power  of  the 
database  language  and  as  a  basis  for  query  optimization  and  implementation 
strategies.  However,  they  are  generally  not  used  at  the  user  level  of  database 
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interaction.  Therefore,  we  define  a  ->1NF  user  language,  called  SQL/NF,  which 
is  based  on  the  widely  used  commercial  language,  SQL  [C+].  SQL/NF  is  de¬ 
signed  using  several  important  language  design  criteria  [Dat2]  and  proposes  a 
simpler  structure  than  a  concurrently  developed  language  for  ->1NF  databases, 
undertaken  at  IBM  Heidelberg  [PT].  The  SQL/NF  language  has  all  of  the  power 
of  the  extended  relational  calculus  and  algebra,  adds  capabilities  for  aggregate 
and  other  functions  of  the  data,  and  adds  language  facilities  for  dealing  with 
null  values. 

Null  values  require  also  a  careful  analysis  in  a  more  formal  setting. 
Since  we  have  the  ability  to  represent  multiple  relationships  in  a  single  -'INF 
relation  without  the  problems  of  redundancy  that  doing  so  in  a  INF  relation 
would  entail,  we  must  also  deal  with  the  fact  that  one  or  more  of  those  rela¬ 
tionships  may  be  unknown  or  non-existent  at  some  time.  This  means  that  we 
must  deal  with  null  values  and  their  particular  manifestation  as  empty  nested 
relations  in  the  -ilNF  model.  Thus,  we  look  at  a  formalism  for  null  values 
in  the  INF  setting  and  extend  those  results  to  the  ->1NF  setting.  We  look 
at  the  unknown ,  non-existent,  and  no-information  interpretation  of  nulls  and 
how  the  empty  nested  relation  fits  into  this  framework.  We  use  an  open  world 
assumption,  where  we  assume  that  not  only  do  relations  contain  the  known 
information  about  the  world,  but  that  other  information  may  belong  there  as 
well.  To  handle  the  different  types  of  nulls  we  create  a  lattice  of  information 
based  on  the  concept  of  more  informative,  with  nothing  more  informative  than 


«-»s.«w*vy. 


a  known  value  or  a  non-existent  null  value  and  nothing  less  informative  than  a 
no-information  null  value.  Using  these  concepts,  we  show  that  previous  results 
[AM,  Lie2]  on  the  axiomatization  of  functional  and  multivalued  dependencies 
in  the  presence  of  null  values  are  incorrect.  By  using  flawed  reasoning  con¬ 
cerning  the  inequality  of  non-existent  nulls  and  the  set  of  actual  values  which 
can  replace  unknown  and  no-information  nulls,  these  authors  modify  the  usual 
axiomatization  of  functional  and  multivalued  dependencies  into  a  much  weaker 
one.  We  show  that  the  usual  axiomatization  holds  even  in  the  presence  of  null 
values. 

We  are  interested  also  in  the  design  of  ->1NF  databases.  There  are 
two  classes  of  nested  relations,  based  on  the  correspondence  to  their  unnested 
counterparts.  Some  nested  relations  cannot  be  created  from  the  corresponding 
unnested  relation  by  any  sequence  of  nest  operators.  An  example  is  shown  in 
Figure  1-3.  Note  that  Smith  and  Jones  are  not  single  partitions  and  that  there 
are  two  (Jones,  typing)  relationships,  one  of  which  would  be  eliminated  in  an 
unnested  version  of  this  relation.  Relations  of  this  class  have  the  annoying 
property  that  there  is  not  always  a  nest  operation  which  will  be  an  inverse  for 
an  unnest  operation.  There  is  also  no  semantic  justification  for  allowing  these 
relations.  The  relationship  depicted  in  Figure  1-3,  is  that  of  employee  and  skill. 
There  is  no  reason  why  each  employee’s  skills  should  not  all  appear  in  one  nested 
relation  for  that  employee.  If  different  sets  of  skills  are  related  to  employees  in 
more  than  one  way,  then  an  additional  attribute  should  be  added  to  distinguish 
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employee 

Skills 

skill 

Smith 

typing 

filing 

Smith 

sorting 

mailing 

Jones 

typing 

dictation 

data  entry 

Jones 

sorting 

typing 

Figure  1-3.  ->1NF  relation  which  can  not  be 
achieved  using  the  nest  operation. 

them.  For  example,  the  different  sets  of  skills  could  represent  different  year’s 
data,  and  so  a  “year”  attribute  should  be  added  to  the  relation.  Now  employee 
and  year  will  jointly  identify  skills  sets,  and  nest  will  be  an  inverse  for  unnest. 
Therefore,  we  define  a  class  of  -ilNF  relations  having  the  property  that  the 
atomic  attributes  of  each  relation  and  nested  relation  are  a  key  for  the  relation. 
This  property  is  called  partitioned  normal  form  (PNF).  The  PNF  property  is 
semantically  equivalent  to  the  structuring  of  -ilNF  relations  via  the  scheme 
trees  of  [OY1]  or  the  formats  of  (ABlJ. 


By  restricting  the  class  of  -ilNF  relations  to  those  that  satisfy  the 
PNF  property,  we  are  able  to  provide  some  interesting  extensions  to  the  -'INF 
algebra  operators.  These  new  operators  were  inspired  by  the  extended  opera¬ 
tors  of  [AB2],  and  have  the  property  that  the  class  of  PNF  relations  is  closed 
under  their  operation.  They  also  maintain  the  implied  multivalued  dependen- 
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cies  that  exist  in  the  INF  counterparts  of  the  operand  relations,  and  so  have 
better  semantic  underpinning  for  ->1NF  relations  them  the  standard  relational 
operators.  We  define  also  versions  of  these  operators  for  the  ->1NF  model  which 
includes  null  values,  and  prove  several  equivalences  among  the  operators  for  use 
in  query  optimization. 

Although  the  class  of  PNF  relations  has  certain  desirable  properties, 
there  are  further  normalization  techniques  which  can  be  applied  to  ->1NF  rela¬ 
tions.  Some  standards  for  normalization  were  proposed  by  [Liel],  but  the  most 
comprehensive  approach  to  the  problem  has  been  done  by  Ozsoyoglu  and  Yuan 
[OYl].  They  define  nested  normal  form  (NNF)  which  structures  ->1NF  rela¬ 
tions  based  on  the  functional  and  multivalued  dependencies  which  must  exist 
in  a  INF  database.  Theirs  is  a  decomposition  approach  which  breaks  down  the 
universe  of  attributes  into  a  scheme  tree  and  then  splits  off  other  scheme  trees 
when  partial  or  transitive  dependencies  exist.  We  provide  a  way  of  achieving 
NNF  using  a  combination  synthesis  and  decomposition  approach  which  starts 
with  a  standard  decomposition  of  the  universe  of  attributes  into  fourth  normal 
form  (a  “good”  design  for  INF  databases  [Fag2]),  employs  given  embedded 
multivalued  dependencies  to  improve  this  decomposition,  and  then  builds  the 
scheme  trees  from  this  set  of  schemes.  Our  approach  improves  the  design  also 
by  using  functional  dependencies  in  a  more  meaningful  way.  In  [OYl],  only 
the  multivalued  dependencies  which  are  implied  by  the  given  functional  depen¬ 
dencies  are  used  in  the  design,  thereby  ignoring  the  different  semantics  of  the 


functional  dependencies.  By  allowing  the  use  of  embedded  multivalued  depen¬ 
dencies  and  better  utilizing  functional  dependencies,  our  approach  can  produce 
superior  -'INF  schemes  over  the  decomposition  approach. 

1.4  Sequence  of  Presentation 

The  remainder  of  this  dissertation  is  organized  as  follows.  Chapter  2  summa¬ 
rizes  the  INF  relational  model,  providing  the  basic  definitions  upon  which  the 
-'INF  relational  model  is  based.  Chapter  3  presents  the  basic  definitions  of 
our  ->1NF  model,  and  discusses  previous  work  in  the  areas  of  formal  query  lan¬ 
guages,  data  dependencies,  normal  forms,  and  applications.  Chapter  4  presents 
equivalent  formal  languages  for  the  -'INF  model,  including  an  extended  rela¬ 
tional  calculus  and  an  extended  relational  algebra.  The  proof  of  their  equiva¬ 
lence  is  defered  until  Chapter  6  so  that  we  can  introduce  some  extended  algebra 
operators  which  will  simplify  the  proof  development.  In  Chapter  5,  we  intro¬ 
duce  a  class  of  ->1NF  relations,  those  that  are  in  “partitioned  normal  form,”  and 
present  some  extended  algebra  operators  for  the  -'INF  relational  model.  These 
operators  are  closed  under  the  the  above  class  and  have  additional  semantic 
motivation.  Chapter  7  presents  our  extensions  for  null  values  and  empty  nested 
relations.  We  discuss  previous  work  on  null  values,  extend  the  -'INF  model  to 
include  null  values  and  empty  nested  relations,  and  provide  new  definitions 
for  the  extended  algebra  operators  of  Chapter  5  in  light  of  the  extension  for 
nulls.  In  Chapter  8  we  define  a  user  oriented  language,  SQL/NF,  for  ->1NF 
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relations.  Utilizing  good  language  design  principles  we  formulate  a  high-level 
database  query  and  manipulation  language  based  on  the  successful  SQL  data 
language  for  INF  databases.  In  Chapter  9  we  present  our  algorithm  for  achiev¬ 
ing  Nested  Normal  Form,  incorporating  embedded  multivalued  dependencies 
and  functional  dependencies  in  the  design  of  ->1NF  relations.  Finally,  Chapter 
10  presents  a  summary  of  our  contributions  and  suggestions  for  future  work  in 
this  area. 
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Chapter  2 

The  INF  Relational  Model 

In  this  chapter,  we  briefly  present  the  basic  characteristics  of  the  INF  relational 
model  [Codl,  Mai2,  Ull].  Portions  of  this  particular  condensation  are  due  to 
Thomas  [Tho]  and  Van  Gucht  [Van].  We  present  some  basic  definitions,  the 
relational  calculus  and  relational  algebra,  data  dependencies  and  the  various 
normal  forms,  and  some  brief  comments  on  null  values. 

2.1  Basic  Definitions 

A  domain  is  a  set  of  values.  If  all  the  values  in  a  domain  are  atomic  (not 
decomposable)  it  is  a  simple  domain,  otherwise  it  is  a  set-valued  or  complex 
domain.  Given  a  collection  of  atomic  domains  Dx,  D2, . . . ,  Dn  (not  necessarily 
distinct),  R  is  a  (INF)  relation  on  these  n  sets  if  it  is  a  set  of  ordered  n-tuples 
(dj,  d2, . . . ,  dn)  such  that  d,-  €  1  <  *  <  n.  The  value  of  n  is  the  arity  of  R.  A 

tuple  (di,d2, . . .  ,d„)  has  n  components ;  the  tth  component  is  d,.  The  domain 
of  an  attribute  is  denoted  DOM(A). 

A  relation  can  be  viewed  as  a  table  with  each  row  corresponding  to 
a  tuple  and  each  column  representing  one  component.  Columns  are  usually 
assigned  names  called  attribute  names.  Individual  attributes  names  are  often 
represented  by  letters  near  the  beginning  of  the  alphabet,  e.g.,  A,B,...,  and 
sets  of  attributes  are  represented  by  letters  near  the  end  of  the  alphabet,  e.g., 
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. . .  ,X,Y,  Z .  Instead  of  writing  {A,B}  for  the  set  containing  attribute  names 
A  and  B  we  use  the  concatenation  AB.  Similarly,  XY  is  used  to  mean  X  U  Y . 
Lower  case  letters  near  the  end  of  the  alphabet  are  used  to  represent  tuples, 

e.g.  s,t, _ Lower  case  letters  p,  q,r  are  used  to  represent  relations  and  upper 

case  letters  P,Q,R  are  used  to  represent  relation  schemes  (defined  below). 
When  actual  names  are  used,  the  first  letter  is  capitalized  if  the  name  refers 
to  a  relation  or  a  set-valued  attribute,  or  left  uncapitalized  if  the  name  refers 
to  an  atomic  attribute.  Many  times,  set-valued  attributes  are  represented  by 
following  the  name  with  an  asterisk,  e.g.,  A*,B*, _ 

We  will  assume,  without  loss  of  generality,  that  all  attributes  of  our 
relations  are  contained  in  a  finite  universe  of  attributes,  V .  A  relation  structure 
R  consists  of  a  relation  scheme  R  and  a  relation  r  defined  on  R,  and  is  denoted 
{ R,r ).  A  relation  scheme  is  defined  by  a  rule  R  =  (Ax,  A2, . . . ,  A„)  where 
A,  G  U,  1  <  i  <  n.  The  set  of  attributes  in  a  relation  scheme  rule  R  is  denoted 
Er.  For  A  G  Er ,  an  A-value  is  an  assignment  of  a  value  from  DOM(A) 
to  attribute  A.  Generalizing  this  notion,  an  X-value,  where  X  C  Er,  is  an 
assignment  of  values  to  the  attributes  in  X  from  their  respective  domains. 
Thus,  a  relation  r  on  scheme  R  can  also  be  defined  as  a  set  of  Er-v alues. 

The  projection  of  relation  r  onto  attributes  X  is  denoted  r [X],  and 
similarly,  the  projection  of  tuple  t  G  r  onto  attributes  X  is  denoted  t[X).  We 
also  use  t\X\  to  denote  an  X-value  of  t  when  we  are  talking  about  an  arbitrary 


assignment  from  the  respective  domains  of  each  attribute  in  X. 


2.2  Relational  Calculus 

We  define  a  tuple  relational  calculus  (TRC).  This  will  form  the  basis  of  our 
extended  relational  calculus  in  Chapter  4.  We  first  present  a  calculus  that 
permits  infinite  relations  and  then  present  “safety”  criteria  which  assures  only 
finite  relations  can  be  produced  from  the  calculus  formulas. 

Formulas  in  relational  calculus  are  of  the  form  (t|^(t)},  where  t  is  a 
tuple  variable  denoting  a  tuple  of  some  fixed  length,  and  x/j  is  a  formula  built 
from  atoms  and  a  collection  of  operators  to  be  defined  shortly.  We  use  t^  to 
denote  the  fact  that  t  is  of  arity  t. 

The  atoms  of  formula  tp  are  of  three  types. 

1.  s  €  r,  where  r  is  a  relation  name  and  s  is  a  tuple  variable.  This  atom 
stands  for  the  assertion  that  s  is  a  tuple  in  relation  r. 

2.  s[t]  0  «[i],  where  s  and  u  are  tuple  variables  and  0  is  an  arithmetic 
comparison  operator  (>,  =).  This  atom  stands  for  the  assertion  that 
the  tth  component  of  s  stands  in  relation  0  to  the  jth  component  of 
u. 


3.  s[i\6a  and  aO  s[i\,  where  6  and  s  are  as  in  (2)  above,  and  a  is  a 
constant.  The  first  of  these  atoms  asserts  that  the  tth  component  of 
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s  stands  in  relation  0  to  the  constant  a,  and  the  second  has  analogous 
meaning. 

To  define  the  operators  of  the  relational  calculus,  we  need  the  concept 
of  “free”  and  “bound”  variables  from  the  predicate  calculus.  An  occurrence  of 
a  variable  in  a  formula  is  “bound”  if  that  variable  has  been  introduced  by  a 
“for  all”  or  “there  exists”  quantifier,  and  the  variable  is  “free”  otherwise. 

Formulas,  and  free  and  bound  occurrences  of  tuple  variables  in  these 
formulas,  are  defined  recursively,  as  follows. 

1.  Every  atom  is  a  formula.  All  occurrences  of  tuple  variables  mentioned 
in  the  atom  are  free  in  this  formula. 

2.  If  V'i  and  xp2  are  formulas,  then  xpi  A ip2,  V'x V 02>  and  -•xpl  are  formulas. 
Occurrences  of  tuple  variables  are  free  or  bound  in  these  formulas  as 
they  are  free  or  bound  in  ipi  or  tp2,  depending  on  where  they  occur. 

3.  If  xjj  is  a  formula,  then  (3s)(ip)  and  (Vs)  (ip)  are  a  formulas.  Occur¬ 
rences  of  s  that  are  free  in  ip  are  bound  to  (3s)  in  (3s)  (rp)  and  (Vs)  in 
(Vs)  (ip).  Other  occurrences  of  tuple  variables  in  ip  are  free  or  bound 
in  (3s)  (ip)  and  (Vs)(ip)  as  they  were  in  ip . 

4.  Parenthesis  may  be  placed  around  formulas  as  needed.  We  assume 
the  order  of  precedence  is:  arithmetic  comparison  operators  highest, 
then  the  quantifiers  3  and  V,  then  -1,  A,  and  V,  in  that  order. 


5.  Nothing  else  is  a  formula. 


A  tuple  relational  calculus  expression  is  an  expression  of  the  form 
{t|V>(f)}»  where  t  is  the  only  free  tuple  variable  in  ip. 

As  it  stands,  this  definition  of  the  TRC  allows  us  to  define  some  infinite 
relations  such  as  €.  r)},  which  denotes  all  possible  tuples  that  are  not 

in  r,  but  are  of  the  length  we  associate  with  t.  As  it  is  impossible  to  calculate 
the  answer  to  this  query,  we  must  rule  out  such  meaningless  expressions.  We 
will  do  this  by  restricting  consideration  to  those  expression,  called  “safe,”  for 
which  it  can  be  demonstrated  that  each  component  of  any  t  that  satisfies  ip 
must  be  a  member  of  DOM  (ip),  which  is  defined  to  be  the  set  of  symbols  that 
either  appear  explicitly  in  ip  or  are  components  of  some  tuple  in  some  relation 
r  mentioned  in  ip.  This  choice  of  DOM(ip)  is  not  necessarily  the  smallest  set 
of  symbols  we  could  use,  but  it  will  suffice  for  the  INF  relational  model. 

We  say  a  tuple  calculus  expression  {tlV’(O)  ls  safe 

1.  Whenever  t  satisfies  ip,  each  component  of  t  is  a  member  of  DOM  (ip). 

2.  For  each  subexpression  of  ip  of  the  form  (3u)(u>(u)),  if  u;  is  satisfied 
by  u,  then  each  component  of  u  is  member  of  DOM(u>). 

3.  For  each  subexpression  of  ip  of  the  form  (Vu)(w(u)),  if  any  component 
of  u  is  not  in  DOM(u),  then  u  satisfies  w. 
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2.3  Relational  Algebra 

Relational  algebra  refers  to  a  group  of  high  level  operators  which  are  used  to 
manipulate  relations.  Each  of  these  operators  takes  one  or  two  relations  as 
input  and  results  in  a  single  relation.  The  formal  definition  of  the  algebra  can 
be  found  in  [Cod3,  Ull].  Here  we  provide  only  the  definitions  of  the  operators 
themselves. 

Since  relations  are  sets  of  tuples,  the  usual  set  operators,  union,  set 
difference  and  Cartesian  product  apply.  These  three,  along  with  the  special 
relational  operators  projection  and  selection,  form  a  relationally  complete  set. 
Relationally  complete  means  that  any  derivable  relations  can  be  retrieved  from 
the  database  using  only  this  set  of  operators.  We  provide  also  definitions  of 
intersection,  theta-join,  and  natural  join  which  are  derivable  from  the  basic 
operator  set.  In  the  following,  let  r  and  q  be  relations. 

1.  Union — The  union  of  r  and  q,  denoted  rUg,  is  the  set  of  all  tuples 
belonging  to  either  r  or  s,  or  both.  Relations  r  and  q  must  be  of  the 
same  arity,  say  n,  and  the  jth  attribute  of  each  relation  must  be  drawn 
from  the  same  domain  (1  <  j  <  n).  When  these  conditions  hold  for 
any  two  arbitrary  relations  they  are  said  to  be  union-compatible. 

2.  Intersection — The  intersection  of  r  and  q,  denoted  rflg,  is  the  set 
of  all  tuples  belonging  to  both  r  and  q.  Relations  r  and  q  must  be 
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union-compatible. 

3.  Difference — The  difference  of  r  and  q,  denoted  r  —  q,  is  the  set  of  all 
tuples  in  r  but  not  in  q.  Relations  r  and  q  must  be  union-compatible. 

4.  Cartesian  product — The  Cartesian  product  of  r  and  q ,  denoted  r  x  q, 
is  the  set  of  all  tuples  that  are  a  concatenation  of  a  tuple  from  r  and 
a  tuple  from  q. 

5.  Projection — The  projection  of  r  over  attributes  AuA2, . . . ,  A„,  de¬ 
noted  7Txl,A3,...,A„(r),  is  the  relation  obtained  by  deleting  all  columns 
in  r  except  those  that  are  identified  by  attributes  AlfA2, . . .  ,An  and 
then  eliminating  duplicate  tuples.  In  formal  proofs  we  will  use  also 
attribute  numbers,  1,2 where  k  is  the  arity  of  r,  instead  of 
attribute  names.  Attribute  number  t  corresponds  to  attribute  name 

6.  Selection — The  selection  of  those  tuples  in  r  satisfying  predicate  F, 
denoted  Of-(r),  constructs  a  subset  of  the  tuples  in  r  satisfying  F. 
The  predicate  F  is  a  formula  built  from  operands  that  are  constants 
or  attribute  names  (or  numbers),  arithmetic  comparison  operators, 
and  the  logical  operators  A,  V,  and 

7.  Theta-join — Let  A  be  an  attribute  in  r  and  B  and  attribute  in  q.  The 
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theta-join  of  r  and  q,  denoted 


r  cx  q, 

is  the  concatenation  of  a  tuple  tr  from  r  and  a  tuple  tq  from  q  such 
that  tr[A]  has  relation  0  to  t9[B\.  When  0  is  equality  the  operation  is 
called  an  equijoin. 

8.  Natural  join — Let  r  be  a  relation  on  scheme  R  and  s  a  relation  on 
scheme  S.  Let  X  =  Er  n  Es  and  Y  ~  Er  U  Es-  The  natural  join  of 
r  and  q,  denoted  r  m  q,  is  the  projection  onto  Y  of  an  equijoin  where 
the  the  equality  test  is  performed  on  each  attribute  in  X. 

The  following  relationships  show  how  intersection,  theta-join,  and  nat¬ 
ural  join  can  be  derived  from  the  basic  set  of  operators. 


1.  r  n  q  =  r  —  (r  —  q). 


2-  r  *<Bq  =  cAtB(r  X  q). 

3.  r  cx  q  =  Jry(ffr.A»=fl.Al*r.Aa=<.AaA  •Ar.A„=,.A„(r  x  ?)),  Where 
Ai,  A2, . .  • ,  An  are  the  common  attributes  of  r  and  qt  renamed  to  be 
unique  by  prepending  r.  or  q.,  as  appropriate,  and  Y  is  the  union  of 
the  set  of  attributes  of  r  and  q. 

We  adopt  the  following  convention  for  attribute  names  in  cartesian 
products  of  relations:  We  shall  use  the  notation  relation-name. attribute-name 


only  when  necessary  to  avoid  ambiguity.  When  no  ambiguity  results,  we  shall 
drop  the  relation-name  prefix. 

2.4  Data  Dependencies 

Each  relation  in  a  relational  database  may  be  expected  to  reflect  certain  as¬ 
sociations  among  the  stored  data.  For  example,  in  a  relation  containing  data 
about  employees  we  might  expect  each  employee  number  to  have  associated 
with  it  a  unique  name,  address,  and  telephone  number.  On  the  other  hand, 
many  employees  may  have  the  same  name.  Such  constraints  on  the  contents  of 
a  database  are  termed  data  dependencies. 

A  relationship  in  which  a  single  value  of  one  set  of  attributes  is  related 
to  the  value  of  a  second  set  of  attributes  is  called  a  functional  dependency  (FD). 
Let  r  be  a  relation  on  scheme  R,  with  X  and  Y  subsets  of  Er.  Relation  r 
satisfies  the  functional  dependency  X  — ♦  Y  if  for  every  pair  of  tuples  and  t2, 
in  r,  if  tx[X\  =  t2[X],  then  tx[Y\  =  t2\Y}. 

Functional  dependencies  provide  us  with  a  way  to  define  formally  the 
notion  of  a  “key”.  Let  r  be  any  relation  on  scheme  R,  and  X  C  Er.  X  is  a 
key  of  R  if  X  — »  Er  holds  in  r,  and  there  does  not  exist  a  proper  subset  Y  of 
X,  such  that  Y  —*  Er  holds  in  r.  A  superkey  is  any  set  of  attributes  which 
contains  a  key. 


A  relationship  in  which  a  set  of  values  associated  with  one  set  of 
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attributes  is  related  to  the  value  of  a  second  set  of  attributes,  independent  of  the 
other  attributes  in  the  relation,  is  called  a  multivalued  dependency  (MVD).  Let  r 
be  any  relation  on  scheme  R,  with  X  and  Y  subsets  of  Er  and  Z  —  Er  —  XY . 
Relation  r  satisfies  the  multivalued  dependency  X— >— *Y  if  for  every  pair  of 
tuples  ti  and  t2,  in  r,  if  tx[X]  =  then  there  exists  a  tuple  t3  in  r  with 

t3[x\  =  *x[X],  ts[Y}  =  *,[y],  and  t3[Z]  =  t2[Z}. 

An  MVD  is  said  to  be  embedded  if  the  MVD  holds  on  a  projection  of 
the  relation.  Let  r  be  any  relation  on  scheme  R,  Z  C  Er,  and  X  C  Z,  Y  C  Z. 
Relation  r  satisfies  the  embedded  multivalued  dependency  (EMVD)  X— ►— >Y\Z  — 
XY  when  the  MVD  X— >— »Y  holds  in  n z(r).  If  an  MVD  or  EMVD,  G,  holds  in 
a  relation  r  with  attributes  Z ,  then  the  projection  of  that  dependency  on  a  set 
of  attributes  Y  C  Z,  denoted  projy[G) ,  holds  in  nY[r)  if  and  only  if  the  left 
hand  side  of  G  is  a  subset  of  Y.  A  dependency  is  projected  on  Y  by  eliminating 
all  attributes  on  the  right  hand  side  that  are  not  in  Y . 

We  use  several  facts  about  MVDs.  Let  U  be  the  universe  of  attributes, 
X  a  set  of  attributes,  and  M  a  set  of  multivalued  dependencies.  A  dependency 
basis  for  X,  denoted  DEPm(X),  or  DEP(X)  when  M  is  understood,  is  a 
partition  of  U  —  X  into  sets  of  attributes  Y\,  Yj, . . . ,  Fn,  such  that  if  Z  C  U  —  X, 
then  X—*—*Z  if  and  only  if  Z  is  the  union  of  some  of  the  y<’s.  For  a  set  M  of 
MVDs,  M+  denotes  the  closure  of  M,  i.e.,  the  set  of  all  MVDs  that  are  implied 
by  M.  Given  two  sets  of  M  and  N  of  MVDs,  M  is  a  cover  of  N  if  M+  =  N+. 
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Many  times  we  want  to  work  with  a  minimum  cover  for  a  set  of  MVDs. 

Definition  2.1:  [0Y1]  Given  a  set  M  of  MVDs  over  17,  an  MVD  X— ►— *W  in 
M+  is  said  to  be 

(a)  trivial  if  XW  =  U,  W  =  0  or  W  C  X, 

(b)  left-reducible  if  3X,,X'  C  X,  such  that  X'— >— >W  is  in  Af+, 

(c)  right-reducible  if  3 W\W*  c  W,  such  that  X— >— *W'  is  in  M+, 

(d)  transferable  if  3X\  X'  C  X,  such  that  X'— -+W(X  -  X')  is  in  M+. 

An  MVD  X— *—*W  is  said  to  be  reduced  if  it  is  nontrivial,  left-reduced,  right- 
reduced,  and  nontransferable.  A  set  of  MVDs  M  is  said  to  be  a  minimum  cover 
if  every  MVD  in  M  is  reduced,  and  no  proper  subset  of  M  is  a  cover  of  M. 

We  use  LHS(M)  to  denote  the  set  of  left  hand  sides  of  the  MVDs  in 
a  set  M  of  MVDs.  Let  M“  be  the  set  of  all  reduced  MVDs  implied  by  M,  N  be 
a  set  of  MVDs,  and  M  be  a  minimal  cover  of  N.  Then  elements  in  LHb(M~) 
are  called  keys  of  N ,  and  the  elements  in  LHS(M)  are  called  essential  keys  of 
N.  Elements  in  LHS(M~ )  —  LHS(M)  are  called  nonessential  keys  of  N. 

A  relationship  which  states  that  certain  projections  of  a  relation  must 
join  (natural  join)  to  the  original  relation  is  called  a  join  dependency  (JD).  Let 
r  be  any  relation  on  scheme  R  and  let  R  =  {i?x,  R2, . . . ,  it!*}  be  a  set  of  schemes 
which  are  projections  of  scheme  R.  Relation  r  satisfies  the  join  dependency 
M  (Rlt  R2, . . . ,  Rn)  if  r  decomposes  losslessly  onto  i2j,i22>--->^n-  That  is, 

r  =  ajiiM  X3  7r^a(r)  cx  •  •  •  x  Tf/ejr). 


If  a  join  dependency  is  equivalent  to  a  set  of  multivalued  dependencies  then  the 
Z  is  an  acyclic  set  of  schemes  Acyclic  schemes  have  several  good  properties 
described  in  [BFMY,  Sac],  including  the  fact  that  the  set  of  multivalued  depen¬ 
dencies  which  are  equivalent  to  a  join  dependency  are  conflict  free.  Properties 
of  conflict  free  dependencies  are  described  in  [AC,  BeKl,  BeK2,  Scil,  Sci3, 
Sci4].  Sciore  [Sci3]  states  that  in  “real  world”  situations,  every  “natural”  set 
of  MVDs  must  be  conflict  free.  Conflict  free  sets  have  the  desirable  property 
that  they  allow  a  unique  fourth  normal  form  dependency  preserving  database 
scheme;  moreover,  non-conflict  free  sets  have  no  such  normalization.  (Normal¬ 
ization  and  normal  forms  are  discussed  in  the  next  section.)  Fixing  a  particular 
scheme  R,  the  set  of  all  relations  on  R  that  satisfies  a  set  of  dependencies  D  is 
denoted  SATR(D),  or  when  R  is  understood,  SAT(D). 

Various  other  dependencies  have  been  proposed  and  studied,  however 
the  FD,  MVD,  EMVD  and  JD  are  the  ones  most  important  in  the  area  of 
database  design  and  normalization.  Functional  dependencies  have  been  stud¬ 
ied  in  [Hon,  Menl],  and  multivalued  dependencies  have  been  studied  in  [Bis3, 
Bis4,  Fag2,  Han,  Men2,  OY2,  Sci3j.  Embedded  multivalued  dependencies  have 
received  particular  attention  by  [PP,  TKY]  and  the  related  template  dependen¬ 
cies  by  [Sad,  SUl,  SU2].  A  comprehensive  look  at  join  dependencies  was  done 
by  [ABU]. 

It  is  well  known  that  these  dependencies  satisfy  a  number  of  infer- 


ence  rules.  We  reproduce  the  list  of  [PP],  which  was  assembled  from  various 
other  sources,  with  a  correction  for  FD-MVD2.  Below  it  is  understood  that 
T,  V,  W,  X,  Y,  Z  represent  sets  of  attributes,  and  U  is  the  universe  of  all  at¬ 


tributes. 

FDl  (Reflexivity):  IfVCX,  then  X  ->  Y. 

FD2  (Augmentation):  If  Z  C  V  and  X  — ►  Y,  then  XV  —*YZ. 

FD3  (Transitivity):  If  X  —*  Y  and  Y  —*  Z,  then  X  — ►  Z. 

FD4  (Pseudo-transitivity):  If  X  — *  Y  and  YV  — >  Z,  then  XV  — »■  Z. 

FD5  (Union):  If  X  —  Y  and  X  -»  Z,  then  X  —  YZ. 

FD6  (Decomposition):  If  X  — »  YZ,  then  X  — >  Y  and  X  — ►  Z. 

MVDO  (Complementation):  Given  U  =  XY Z  and  Y  fl  Z  C  X, 

X->->Y  iff  X++Z. 

MVD1  (Reflexivity):  If  Y  C  X,  then  X-y-yY. 

MVD2  (Augmentation):  If  Z  C  V  and  X— *— >Y,  then  XV-+-+YZ. 
MVD3  (Transitivity):  If  X— *— *Y  and  Y—y—yZ,  then  X— ♦— +Z  —  Y. 

MVD4  (Pseudo-transitivity)  :  If  X— *— *Y  and  YV~+->Z, 

then  XV-y-yZ  -  YV. 

MVD5  (Union):  If  X-y-yY  and  X-y-yZ  then  X-y-yY  Z. 

MVD6  (Decomposition):  If  X— *— *Y  and  X— ♦— >Z  then  X-y-yY  D  Z , 

X-yyY  -  Z,  and  X-~Z  -  Y. 

FD-MVD1:  If  X  ->  Y  then  X-y-yY. 

FD-MVD2:  If  X-»-+Z  and  Y  -y  V  where  VCZ&ndYnZ  =  0, 
then  X-*V. 

FD-MVD3:  If  X-yyY  and  XY  ->  Z,  then  X  -»  Z  -Y . 

EMVDO  (Complementation):  If  X— *— »y)Z,  then  X— *— *Z\Y. 

EMVD1  (Projection):  If  X-^yZjU,  then  X-y-*Y\V . 

EMVD2  (Root  Weighting):  If  X-y-yY Z\V,  then  Xy-~Z|U. 

EMVD3  (Decomposition):  If  X— *— >yjZV  and  XY— >— ♦Z|V’, 

then  X-+-»Z|y. 

EMVD4  (Intersection):  If  X—y—*Y\Z  and  X— *— *V\W  where  Y  n  V  ^  0 

and  y  n  W  #  0,  then  X-y-yY  nV\ YnW. 
EMVD5  (Pseudo-transitivity):  If  X+*Y\ZVW  and  yZ-+-+y|XT  with 

x,  y,  z,  y,  w  disjoint  and  x,  y,  z,  y,  r 

disjoint,  then  XZ— *— yV\YT. 


MVD-EMVDl  (Joinability):  XY-^Z  and  X-+-+Y\Z  iff  X-^Z. 
MVD-EMVD2  (Union):  If  XY++VW  and  XZ-^VT  where  T  CY 

and  W  CZ,  and  X^Y\Z,  then  X-++VWT. 

FD-EMVD1:  If  X-^Y\Z  and  Y  ->  Z,  then  X  -»  Z. 

FD-EMVD2:  IT  X-^Y\Z  and  XY  ->  V,  then  X-^YV\Z. 

We  note  that  the  inference  rules  FD1-FD6,  MVD0-MVD6,  FD-MVD1,  and  FD- 
MVD2  form  a  sound  and  complete  axiomatization  of  FDs  and  MVDs  [BFH]. 
Parker  and  Parsaye-Ghomi  [PP]  showed  that  there  can  be  no  finite  set  of  infer¬ 
ence  rules  for  EMVDs,  based  on  the  assumption  that  an  arbitrary  number  of 
attributes  is  available.  Should  the  number  of  attributes  be  fixed,  there  must  be 
a  complete  set  of  rules,  although  the  cardinality  of  this  set  will  be  large  (and 
as  of  yet,  still  undiscovered). 

2.5  Normal  Forms 

A  “normal  form”  is  a  restriction  on  the  database  scheme  that  precludes  certain 
undesirable  properties  from  the  database.  Most  of  these  undesirable  properties 
deal  with  update  (including  insert  and  delete)  anomalies  and  redundancy  in 
the  database.  A  number  of  different  normal  forms  for  relation  schemes  with 
dependencies  have  been  defined  so  that  some  of  the  anomalies  and  redundancy 
are  no  longer  present  in  the  database.  Since  they  will  play  a  part  in  ->1NF 
database  normalization,  we  will  consider  two  normal  forms  here:  Boyce-Codd 
Normal  Form  [Cod3,  LeP]  and  Fourth  Normal  Form  [Fag2j.  The  two  definitions 
that  follow  are  slightly  different  than  usual  as  they  do  not  require  that  the 
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relation  be  in  INF;  that  is,  the  all  domains  are  simple.  As  Kobayashi  [Kob] 
pointed  out,  the  INF  restriction  is  strictly  not  needed  in  the  definition  of  these 
normal  forms,  however  traditional  dependency  theory  is  not  capable  of  dealing 
with  complex  domains  and  so  the  INF  restriction  is  added  to  the  definitions. 

A  relation  scheme  R  with  FDs  F  is  said  to  be  in  Boyce-Codd  Normal 
Form  (BCNF)  with  respect  to  F,  if  whenever  X  — ►  A  holds  in  any  relation  r 
on  scheme  R,  and  A  &  X,  then  X  — ►  ER  holds  in  r;  that  is,  X  is  a  key. 

Now,  let  R  be  a  relation  scheme  and  D  a  set  of  FDs  and  MVDs.  We 
say  that  R  is  in  Fourth  Normal  Form  (4NF)  with  respect  to  D,  if  whenever 
there  is  a  MVD  X-^Y,  where  Y  ±  0,  Y  %  X,  and  XY  #  ER,  then  X  —  ER\ 
that  is,  X  is  a  key.  We  note  that  4NF  implies  BCNF. 

Other  normal  forms  include  2NF  and  3NF  which  are  defined  based 
on  schemes  being  free  from  partial  and  transitive  dependencies,  respectively 
[Codl],  an  improved  3NF  which  removes  superfluous  attributes  from  schemes 
[LTK],  Project/Join  Normal  Form  (PJ/NF)  based  on  the  two  operators  projec¬ 
tion  and  natural  join  [Fag3],  and  Domain/Key  Normal  Form  (DK/NF),  an  ulti¬ 
mate  normal  form  based  only  on  domain  and  key  constraints  (as  yet  unattain¬ 
able  in  general)  [Fagl] .  Note  that  2NF  and  the  two  3NFs  are  defined  when  only 
FDs  are  present,  PJ/NF  for  JDs,  MVDs  and  FDs,  and  DK/NF  for  arbitrary 
constraints. 

The  goal  of  database  design  is  to  produce  a  set  of  schemes  which  ex- 


hibits  the  good  properties  espoused  by  the  various  normal  forms  [LST,  ZM]. 
Two  approaches  are  generally  used  in  the  design  algorithms:  decomposition 
and  synthesis.  The  decomposition  method  assumes  a  universal  relation  [FMU] 
containing  all  attributes  of  the  database  exists,  and  then  proceeds  to  decom¬ 
pose  this  scheme  based  on  the  dependencies  to  be  satisfied,  and  the  normal 
form  to  be  achieved.  A  decomposition  of  a  scheme  is  its  replacement  by  a  col¬ 
lection  p  —  {R\,  R.2, . .  • ,  f?n},  where  Er{  C  Er ,  1  <  t  <  n,  and  =  Er. 

The  decomposition  p  is  a  lossless  join  decomposition  with  respect  to  a  set  of 
dependencies  D  if  for  every  relation  r  on  scheme  R  satisfying  D: 

r  =  r[ERi\  m  t[Er2\  txa  •  •  •  cx  r[ER „]. 

The  synthesis  method  starts  with  the  attributes  in  each  dependency  and  syn¬ 
thesizes  a  set  of  schemes  which  meet  the  goals  of  the  normal  form.  In  the  next 
chapter,  we  will  see  that  there  are  disadvantages  to  vertical  decomposition  and 
that  ilNF  is  a  viable  alternative. 

2.6  Null  Values 

Throughout  this  discussion  we  have  ignored  the  concept  of  null  values  in  the 
database.  Null  values  indicate  the  lack  or  nonexistence  of  information  in  the 
database.  They  do  not  lend  themselves  well  to  the  rigorous  analysis  that  ap¬ 
plies  to  most  other  aspects  of  the  relational  model.  However,  there  has  been 
substantial  research  in  this  area  which  is  summarized  in  Chapter  7.  It  is  in  that 
chapter  that  we  discuss  the  role  of  the  null  value  in  both  the  INF  and  ->1NF 
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relational  models.  Until  then  we  assume  that  null  values  are  not  allowed  in  the 
database. 
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Chapter  3 

The  --INF  Relational  Model 


In  this  chapter,  we  describe  several  aspects  of  the  ->1NF  relational  model.  We 
examine  various  database  models  which  have  been  proposed  for  dealing  with 
non-atomic  domains  and  define  the  particular  model  we  will  be  using  in  this 
dissertation.  We  will  then  take  a  brief  look  at  previous  work  done  in  the  areas  of 
query  languages,  dependency  theory,  normal  forms,  and  applications  for  ->1NF 
relations. 


3.1  ->1NF  Database  Models 


One  of  the  chief  benefits  derived  from  working  with  the  relational  approach  to 
databases  is  that  it  can  be  couched  within  the  formalism  of  first-order  predicate 
logic.  As  a  result  many  important  issues  can  be  addressed  mathematically  when 
one  assumes  the  database  is  relational.  However,  when  the  INF  assumption  is 
not  made,  we  need  an  analogous  formalism  that  will  serve  the  ->1NF  approach. 
There  are  several  existing  models  which  have  the  characteristics  we  require. 


The  database  abstractions  of  Smith  and  Smith  [SmS]  model  aggrega¬ 
tion  and  generalization  of  data.  We  are  interested  in  the  ability  to  aggregate 
simple  domains  into  complex  domains,  but  not  the  classification  of  domains 
that  generalization  allows.  Our  extensions  are  developed  for  a  simpler  model 
in  which  generalization  is  not  allowed.  However,  generalization  can  be  simu- 
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lated  in  our  model  by  using  a  "class”  attribute  to  distinguish  tuples  in  different 
generalization  categories.  For  example,  the  “vehicle”  domain  is  a  generaliza¬ 
tion  of  “car,”  “truck,”  and  “bus”  domains.  We  would  add  an  attribute  “vehicle- 
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type,”  to  distinguish  data  elements  of  these  different  classes  within  the  “vehicle” 
domain.  Abiteboul  [Abi]  extends  this  work  by  introducing  disaggregation ,  an 
inverse  of  the  aggregation  concept.  Disaggregation  can  be  regarded  as  a  colum¬ 
nwise  index  which  maps  each  value  in  a  particular  column  into  a  tuple.  The 
concept  of  indexing  is  taken  to  the  extreme  in  Orman’s  indexed  data  sets  [Orm], 
In  this  model  the  values  of  one  attribute  are  used  as  an  index  for  the  values  of 
other  attributes  using  binary  associations.  This  has  the  effect  of  partitioning 
relations  via  the  indices. 

A  more  restricted  model  for  non-first-normal-form  relations  is  the 
Verso  model  [B+,  ABI],  where  instances  are  defined  over  a  format.  A  for¬ 
mat  is  recursively  defined  by: 

(i)  let  X  be  a  finite  string  of  attributes  with  no  repeated  attribute,  then 
X  is  a  format  over  the  set  X  of  attributes,  and 

(ii)  let  X  be  a  finite  string  of  attributes  with  no  repeated  attribute,  X  non¬ 
empty,  and  fi,  /2, . . . ,  /„  some  formats  over  Yiy  Y2t . . . ,  Yn,  respectively, 
such  that  the  sets  X,l|,l2,--.»Vn,  are  pairwise  disjoint,  then  the 
string  X(fi)*(fi)*  •  •  •  (/„)*  is  a  format  over  the  set  XT* Yj  •  •  •  Yn. 

Let  tup(X)  be  a  set  of  X-values.  An  instance  over  a  format  /,  denoted  inst(f ), 
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is  recursively  defined  by: 

(i)  if  /  =  X  then  inst(f)  is  a  finite  subset  of  tup{X),  and 

(ii)  if  /  =  X(/i)*(/2)*  •  •  •  (/„)*  then  I  is  in  inst(f)  if  and  only  if 

(a)  I  is  a  finite  subset  of  tup(X)  x  inst(fi)  x  inst[f2 )  x  x 
in3t(fn),  and 

(b)  if  (u,/i,  J2,...,/„)  and  (u',  I[,  I'2, . . .  ,I'n)  are  in  I  then  u  /  u' 

or  <u,  h,  In)  =  {«',  I[,Ii, . . .  ,I'n). 

The  (a)  condition  states  that  I  is  atomic  on  the  attributes  in  X  and  not  atomic 
on  the  “attributes”  ft,  ft,...,  /„•  The  (b)  condition  forces  X  to  be  a  key.  This 
is  a  large  restriction  on  what  ->1NF  relations  can  be.  We  will  look  at  the 
advantages  of  such  a  restriction  in  section  3.4. 

The  format  model  of  Hull  and  Yap  [HY]  recursively  builds  formats  us¬ 
ing  the  three  data  constructs:  collection,  composition,  and  classification.  Com¬ 
position  and  classification  are  closely  related  to  aggregation  and  generalization, 
respectively,  of  [SmS].  Collection  allows  one  to  specify  the  formation  of  sets  of 
objects,  all  of  a  given  type.  The  database  logic  of  Jacobs  [Jacl-3]  is  a  frame¬ 
work  for  a  heterogeneous  database  which  can  serve  the  relational,  hierarchical, 
and  network  approaches.  Kuper  and  Vardi  [KV1]  have  a  modified  database 
logic  which  models  also  virtual  records,  introducing  cyclicity  into  the  schema 
level  and  solves  the  problems  of  noncomputable  queries  present  in  Jacob’s  logic. 
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Kuper  and  Vardi’s  logic  specifies  database  instances  by  r-values  for  the  data 
space,  and  l-values  for  the  address  space.  An  instance  of  the  database  is  a 
set  of  1- values  and  their  associated  r-values.  This  makes  query  languages  very 
cumbersome  to  use  since  the  user  must  know  about  and  manipulate  “concep¬ 
tual  addresses”  throughout  the  database.  Additionally,  database  logic  is  too 
powerful  for  our  purposes.  In  particular,  Kuper  and  Vardi  [KV2]  show  that 
their  algebra  is  equivalent  to  an  algebra  which  includes  the  power  set  operator, 
which  is  not  expressible  (see  Appendix  A)  with  the  basic  relational  operators 
or  the  extended  algebra  operators  for  -ilNF  relations  (see  Chapter  4).  Other 
more  powerful  models  include  the  Graph  Data  Model  described  in  [Kun]  and 
the  various  semantic  data  models  such  as  the  Functional  Data  Model  described 
in  [Shi]. 

We  follow  the  lead  of  Fischer  and  Thomas  [FT]  and  adopt  a  formalism 
adapted  from  the  database  logic  of  Jacobs.  Some  of  the  following  description 
of  our  -ilNF  model  is  taken  from  [FT]. 

A  database  scheme  S  is  a  collection  of  rules  of  the  form 


Rj  —  (Rj, ,  Rj2 , .  • . ,  Rj„ ) . 

The  objects  Rj,  Rj0  1  <  t  <  n,  are  attributes.  Rj  is  a  higher  order  attribute  if 
it  appears  on  the  left  hand  side  of  some  rule;  otherwise  it  is  zero  order.  Each 
zero  order  attribute  has  an  associated  domain  from  which  the  attributes  values 
are  drawn.  The  attributes  on  the  right  hand  side  of  rule  Rj  form  a  set  denoted 


Employee 


ename  Children 


Skills 

type 

Exams 

year 

city 

typing 

1984 

Atlanta 

1985 

Dallas 

dictation 

1984 

Atlanta 

filing 

1984 

Atlanta 

1975 

Austin 

1971 

Austin 

typing 

1962 

Waco 

Figure  3-1.  A  sample  relation  on  the  Emp  scheme. 

Er;,  the  elements  of  Rj.  As  with  any  set,  attributes  on  the  right  hand  side  of 
the  same  rule  are  unique,  and  to  avoid  ambiguity,  no  two  rules  can  have  the 
same  attribute  on  the  left  hand  side. 

To  illustrate  this,  consider  the  following  database  scheme. 

Emp  =  (ename,  Children,  Skills), 

Children  =  (name,  dob), 

Skills  =  (type,  Exams), 

Exams  =  (year,  city). 

In  this  scheme  each  employee  has  a  set  of  children  each  with  a  name  and 
birthdate,  and  a  set  of  skills,  each  with  a  skill  type  and  a  set  of  exam  years 
and  cities,  when  and  where  the  employee  retested  his  proficiency  at  the  skill. 
A  sample  relation  is  shown  in  the  relation  in  Figure  3-1. 

In  this  example,  the  higher  order  attributes  are  Emp ,  Children ,  Skills 
and  Exams.  All  others  are  zero  order  attributes.  An  attribute  Rj  is  external  if 


it  appears  only  on  the  left  hand  side  of  some  rule,  otherwise  it  is  internal.  Thus 
in  the  above  example,  Emp  is  external  while  all  other  attributes  are  internal. 
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We  often  are  concerned  with  an  individual  table  or  relation  scheme, 
not  with  the  entire  database.  Let  Rj  be  an  external  attribute  in  database 
scheme  S.  The  rules  in  S  which  are  accessible  from  Rj  form  a  subscheme  of  5, 
defined  as  follows: 

1.  Rj  =  (Rjt ,  Rj3, . . . ,  Rjn )  is  in  the  subscheme,  and 

2.  When  a  higher  order  attribute  Rk  is  on  the  right  hand  side  of  some 
rule  in  the  subscheme,  the  rule  Rk  =  (Rkl,Rk2, . . . ,  Rkn)  is  also  in  the 
subscheme. 

A  subscheme  is  called  a  relation  scheme  if  in  addition: 

3.  No  zero  order  attribute  appears  on  the  right  hand  side  of  two  different 
rules  in  the  scheme. 

For  example,  consider  the  employee  database  scheme.  The  subscheme  starting 
with  Emp  contains  the  rules  for  Emp,  Children,  Skills  and  Exams,  and  the 
subscheme  starting  with  Children  contains  only  the  rule  for  Children.  Since 
there  are  no  zero  order  attributes  appearing  in  more  than  one  rule,  both  of 
these  subschemes  are  also  relation  schemes. 

A  INF  database  scheme  is  a  collection  of  rules  of  the  form  Rj  = 
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. . . ,  Rjn )  where  all  the  Rj.  are  zero  order.  A  ->1NF  scheme,  however, 
may  contain  any  combination  of  zero  or  higher  order  attributes  on  the  right 
hand  side  of  the  rules  as  long  as  the  scheme  remains  nonrecursivef.  Note  that 
a  nested  relation  is  represented  simply  as  a  higher  order  attribute  on  the  right 
hand  side  of  a  rule. 

As  in  the  INF  relational  model,  let  R  be  an  attribute  appearing  in  a 
database  scheme  S.  An  instance  of  R,  written  r,  is  an  ordered  pair  of  the  form 
(R,  VR)  where  VR  is  a  value  for  attribute  R.  When  R  is  a  zero  order  attribute, 
VR  is  just  any  value  from  the  domain  of  R.  When  R  is  a  higher  order  attribute, 
VR  must  be  expanded  in  terms  of  the  attributes  on  the  right  hand  side  of  rule 
R.  We  will  omit  the  attribute  name  in  an  instance  specification  when  the  name 
is  understood  from  the  context. 

Two  schemes  Rf  and  Rj  are  equal  if  they  are  comprised  of  the  same 
rules.  In  order  for  two  structures  to  be  equal,  their  schemes  and  instances  must 
be  equal.  Two  instances  and  r2  of  equal  relation  schemes  Ri  and  R}  are 
equal  if  the  identity  mapping  is  an  isomorphism  from  r2  to  r2. 

3.2  Formal  Query  Languages  for  ->1NF  Relations 

Extensions  to  relational  calculus  and  relational  algebra  languages  to  support 
-ilNF  relations  began  by  adding  one-level  nest  and  unnest  operators  to  the 


f  Recursive  schemes  are  beyond  the  scope  of  this  dissertation. 
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basic  relational  algebra.  Jaeschke  and  Schek  [Jael,  JS]  defined  one-level  nest 
and  unnest  operators  and  extended  selection  predicates  to  include  containment 
and  subset  comparison  operators.  Ozsoyoglu,  Ozsoyoglu,  and  Matos  [OOM2] 
extend  the  relational  algebra  and  calculus  for  set- valued  attributes  (single- 
attribute,  one-level  nests)  and  aggregate  functions  (e.g.,  MAX,  SUM,  AVG). 
Language  extensions  for  aggregate  functions  can  also  be  found  in  [Eps,  Klu]. 
Arisawa,  Moriya,  and  Miura  [AMM]  take  single-attribute,  one-level  nesting  to 
the  extreme  by  nesting  every  attribute  of  the  relation  and  studying  operations 
on  these  relations. 

Multi-attribute,  multi-level  nesting  was  first  studied  by  Fischer  and 
Thomas  [FT,  TF]  and  Abiteboul  and  Bidoit  [AB2].  Fischer  and  Thomas  ex¬ 
tend  the  basic  relational  algebra  with  multi-attribute  nest  and  unnest  operators 
and  study  the  interaction  of  these  operators  with  the  traditional  algebra  op¬ 
erators.  Due  to  the  Verso  model’s  more  restricted  nature  (see  section  3.1), 
Abiteboul  and  Bidoit  introduce  extended  algebra  operators  in  addition  to  nest 
and  unnest,  which  maintain  the  underlying  semantics  of  their  relations.  Some 
of  these  operators  are  refined  and  formally  presented  in  Chapter  5.  An  attempt 
is  also  made  in  [AB2]  to  define  a  select  operator  which  operates  in  a  recursive 
manner  to  select  tuples  from  nested  relations.  Jaeschke  [Jae2]  also  has  a  pro¬ 
posal  for  an  algebra  similar  to  [FT],  but  with  additional  local  algebra  operators 
which  operate  within  nested  relations  that  occur  in  every  tuple.  The  full  power 
of  a  recursive  algebra  in  which  operators  can  be  nested  within  other  algebra 


operators  has  been  proposed  in  [Jae3,  Sch2,  ScSl,  ScS2]. 

The  algebra  operators  of  the  recursive  algebras  and  extensions  to  in¬ 
clude  set  comparison  operators  can  all  be  expressed  in  terms  of  a  basic  relational 
algebra  and  the  addition  of  multi-attribute  nest  and  unnest.  This  is  the  algebra 
we  present  in  Chapter  4,  along  with  a  new  calculus  of  equivalent  power. 

3.3  Dependencies  for  -.INF  Relations 

Two  primary  directions  have  been  taken  in  the  area  of  dependency  theory  as 
applied  to  ->1NF  relations.  One  direction  has  been  to  define  new  dependencies 
directly  on  -.INF  relations,  while  the  other  direction  involves  using  dependen¬ 
cies  defined  on  INF  relations  and  investigating  their  consequences  in  the  nested 
counterparts  of  those  relations. 

3.3.1  New  Dependencies  for  INF  Relations 

Some  researchers  [Kob,  Mak,  Tho]  have  extended  the  usual  definitions  of  depen¬ 
dency  by  simply  extending  the  notion  of  equality  expressed  in  these  definitions 
to  include  set-equality  when  higher  order  attributes  are  involved.  For  example, 
in  Figure  3-1,  the  extended  FDs,  ename  — »  Children  and  ename  — ►  Skills,  hold 
in  the  Employee  relation.  Furthermore,  we  would  expect  them  to  hold  on  any 
relation  over  the  Emp  scheme.  These  extended  dependencies  are  generally  un¬ 
able  to  cope  with  nested  relations.  Looking  at  Figure  3-1  again,  we  see  that  the 
extended  FD,  type  — *■  Exams,  holds  in  each  nested  Skills  relation.  However,  if 


we  unnested  Employee  on  the  Skills  attribute,  that  same  FD  would  no  longer 
hold.  Thus,  the  concept  of  “local”  dependency  [Tho,  Van]  was  introduced. 
A  dependency  is  local  if  it  holds  within  a  nested  relation.  If  the  dependency 
holds  in  the  nested  relation  of  each  tuple,  throughout  the  relation  it  is  nested 
in,  then  the  dependency  is  said  to  be  uniformly  local.  The  usual  dependency 
which  must  hold  on  an  entire  relation  is  now  called  global.  Several  interesting 
results  were  discovered  by  Thomas  [Tho]  concerning  the  interaction  of  global 
and  uniformly  local  dependencies  with  the  nest  operator. 

A  new  dependency,  directly  involving  the  higher  order  attributes  of 
a  relation,  was  introduced  by  Van  Gucht  and  Fischer.  They  define  the  strong 
functional  dependency  (SFD)  for  one-level  schemes  [FVl]  and  the  generalized 
functional  dependency  (GFD)t  for  multi-level  schemes  [VF].  Since  GFDs  in¬ 
clude  SFDs  as  a  subclass,  we  will  describe  the  GFD  only.  Let  5  be  a  scheme, 
H(S)  the  higher  order  attributes  of  S,  and  A[S)  the  lower  order  attributes  of 
S.  First,  we  need  a  recursive  definition  of  intersection  for  multi-level  schemes 
called  overlap.  Let  vlfv2  be  tuples  of  s  on  relation  scheme  S  and  let  Y  £  H(S). 
We  say  that  wx  and  v2  overlap  on  Y,  denoted  v1(F)  ovp  v2(F)  if  and  only  if 

(1)  Vj(y)  n  Vj(y)  #  0,  or 

(2)  there  exist  tuples  t\  £  and  t2  £  v2(F)  such  that  fi[A(F)]  = 

t2[A(y)]  and  t\{M)  ovp  t2(M)  for  all  M  £  H(Y). 

t  This  GFD  is  different  than  one  used  in  [Ull,  SU1]  for  generalizing  FDs  for  INF 
databases. 
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Let  s  be  a  relation  on  scheme  S,V,Z  C  Es ,  W  C  H(S).  We  say  that  s  satisfies 
the  generalized  functional  dependency  V<W>  — ►  Z  if  and  only  if  for  any  two 
tuples  tut2  G  s  such  that  *i(V)  =  t2(V)  and  ti(M)  ovp  t2(M)  for  all  M  €  W, 
we  have  ti(Z)  =  t2(Z). 

When  W  =  0,  a  GFD  is  nothing  but  an  ordinary  FD.  GFDs  are  used 
in  [VF]  to  characterize  a  class  of  -'INF  relations  called  “permutable  nested 
relations,”  and  in  [Van]  to  characterize  the  semantics  of  some  ->1NF  relations. 

8.3.2  Using  Dependencies  on  INF  Relations 

Several  proposals  have  been  made  for  using  dependencies  defined  on  INF  re¬ 
lations  for  the  purpose  of  structuring  ->1NF  relations.  Ozsoyoglu  and  Yuan 
[OYl]  use  functional  and  multivalued  dependencies  to  determine  how  to  set  up 
a  “good”  set  of  ->1NF  relations,  which  takes  advantage  of  the  given  dependen¬ 
cies.  For  example,  let  U  be  a  set  of  attributes,  X,  Y,  and  Z  a  partition  of 
U,  and  r  a  INF  relation  on  scheme  R  =  (17).  If  the  multivalued  dependency 
X—*— *Y\Z  holds  in  r  then  consider  the  relation  s  with  the  Z  attributes  forming 
one  nested  relation  and  the  Y  attributes  forming  another  nested  relation  for 
each  X  value.  Relation  s  is  a  ->1NF  relation  with  several  good  properties.  First, 
X  is  a  key  for  s,  giving  a  unique  tuple  in  s  for  each  X-value.  Second,  the  Y  and 
Z  nested  relations  are  independently  updatable;  adding  a  value  to  Z  ( Y )  auto¬ 
matically  enforces  the  underlying  MVD  by  matching  all  values  in  Y  (Z)  with 
the  new  value  added.  Third,  we  can  nest  the  underlying  relation  in  any  order, 


first  by  Z  then  by  Y,  or  in  the  reverse  order,  and  achieve  the  same  relation  s. 
We  will  discuss  these  issues  further  in  section  3.4,  where  they  play  a  part  in 
normal  forms  for  -ilNF  relations.  MVDs  (and  EMVDs)  can  be  expressed  also 
as  first-order  hierarchical  dependencies  and  generalized  hierarchical  dependen¬ 
cies  [Del].  These  dependencies  more  easily  show  the  hierarchical  structure  of  a 
set  of  MVDs,  but  do  not  provide  any  more  power  in  the  ->1NF  design  process. 

Kambayashi,  et  al.  [KTT,  KTTY],  give  procedures  for  designing 
nested  relations  using  a  set  of  constraints  consisting  of  one  join  dependency, 
functional  dependencies  satisfied  by  each  component  of  the  join  dependency, 
and  a  hierarchy  of  related  attribute  sets.  A  join  dependency  can  be  used  in  a 
similar  way  that  we  used  the  MVD  above  to  achieve  a  nested  scheme.  Func¬ 
tional  dependencies  are  used  to  do  further  nesting  within  each  nested  relation 
of  the  scheme  produced  using  the  JD.  Note  that  if  the  FD,  X  — *  Y  holds  in 
a  relation  and  we  nest  on  Y ,  then  each  nested  relation  will  be  a  singleton  set. 
This  is  clearly  a  wasted  operation,  and  so  [KTT]  proposes  a  scheme  where  a 
chain  of  FDs  (using  the  transitivity  property  of  FDs)  is  used  and  the  right 
hand  sides  of  the  FDs  are  successively  nested  to  achieve  the  most  redundancy 
reduction  possible.  Related  attribute  sets  are  simply  collections  of  attributes 
that  are  grouped  together  so  they  can  be  accessed  as  a  single  unit.  The  prob¬ 
lem  with  this  approach  is  that  a  single  JD  is  not  enough  to  characterize  the 
structure  of  nested  relations  to  anything  more  than  one-level  deep. 
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A  new  dependency  on  INF  relations  was  discovered  by  [JS]  to  char¬ 
acterize  exactly  when  two  nest  operations  will  commute.  The  dependency  is 
called  a  weak  multivalued  dependency  (WMVD)  and  is  defined  as  follows.  Let 
U  be  a  set  of  attributes  and  let  X,  Y,  Z  be  subsets  of  U  such  that  Z  =  U  —  XY . 
A  WMVD,  denoted  X-W-+Y,  is  a  template  dependency  with  hypothesis  rows 
$i,  t2,  and  £3,  and  a  conclusion  row  £4  such  that: 

1.  £i[X]  =  £2[X]  =  £3[X]  =  £4[X] 

2.  £1  [Y\  =  t2[Y\ 

3.  ti[Z]  =  t3[Z] 

4.  U[Y]  =  ts[Y] 

5.  U\Z\  =  t2\Z] 

In  tableau  form,  X-W-+Y  is  the  WMVD 


X  Y  -  X  Z 
t\:  x  y  z 

t2:  x  y  z' 

t3:  x  y'  z 

£4:  x  y1  F 

A  relation  r  on  scheme  R  =  (17)  satisfies  X—w—*Y  if  r  satisfies  the  TD 
{ti,t2,t3)/t4  given  above.  In  contrast,  the  ordinary  MVD,  X— >— would  cor¬ 
respond  to  the  TD  [t2,t3)/t4. 

The  major  contribution  of  the  WMVD  is  its  characterization  of  when 
nests  commute.  Using  U,  X,  Y ,  Z,  and  r  as  above,  [JS]  showed  that  X—w—>Y 
holds  in  r  if  and  only  if  nesting  on  Y  and  Z  commutes.  This  of  course  was 
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for  single-attribute,  single-level  nesting.  Thomas  [Tho]  extended  this  result 
to  nesting  on  arbitrary  structures,  and  Fischer  and  Van  Gucht  [FV3]  extend 
Thomas’  results  to  more  than  two  nest  operations.  [FV3]  provides  also  a  sound 
and  complete  axiomatization  of  WMVDs,  and  [Van]  extends  this  to  a  sound 
and  complete  axiomatization  of  a  mixed  system  of  MVDs  and  WMVDs. 

3.4  Normal  Forms  for  -.INF  Relations 

8.4-1  Horizontal  Decomposition 

Researchers  have  suggested  that  horizontal  decomposition  or  nesting  can  be 
used  instead  of  vertical  decomposition  to  improve  database  design.  Horizontal 
decomposition!  was  suggested  by  Furtado  [Fur]  to  improve  schemes  that  are  not 
dependency  preserving  BCNF.  A  dependency  is  preserved  by  a  decomposition 
if  the  attributes  of  the  dependency  exist  in  one  scheme  or  the  dependency 
is  implied  by  the  non-trivial  dependencies  whose  attributes  are  subsets  of  a 
scheme.  An  example  used  in  [Fur,  Scil,  Ull,  Van]  to  illustrate  this  is  as  follows. 
Consider  the  relation  scheme  R  =(city,  st,  zip).  A  tuple  ( c,s,x )  is  in  a  relation 
on  scheme  R  if  city  c  has  a  building  with  street  address  s,  and  x  is  the  zip  code 
for  that  address  in  that  city.  We  have  the  following  FDs: 

•  {city,  st)  — ►  zip 

•  zip  — ♦  city. 

t  A  database  model  employing  horizontal  partitioning  was  developed  around  the  con¬ 
cept  of  “quotient  relations”  by  Furtado  and  Kerschberg  [FK].  An  algebraic  specification  for 
quotient  relations  as  an  abstract  data  type  is  found  in  [Tom]. 
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The  BCNF  decomposition  of  this  scheme  is 

•  Ri  =(st,  zip) 

•  Rj  ={zip,  city). 

This  scheme  is  not  dependency  preserving  since  the  attributes  of  {city,  st}  — ► 
zip  are  not  included  in  either  Ri  or  R^  and  the  only  FD  which  is  included,  zip 
— *■  city,  does  not  imply  {city,  st}  — +  zip.  Thus,  we  can  have  legal  instances  of 
relations  on  the  decomposed  schemes  that  do  not  join  to  a  legal  instance  of  the 
original  scheme. 

[Fur]  suggests  horizontal  partitioning  of  scheme  Ri  by  city.  Then  in 
each  block  created  by  the  partitioning  the  dependency,  st  — ►  zip,  holds,  and 
each  block  is  disjoint  from  all  others.  Thus,  we  can  assure  that  the  dependency, 
{city,  st}  — »  zip,  is  enforced  by  checking  that  the  induced  dependency,  st  — ►  zip, 
holds  in  each  block,  and  by  checking  that  zip  codes  remain  partitioned  among 
the  blocks. 

When  we  allow  nested  relations,  then  even  the  initial  vertical  decom¬ 
position  is  not  necessary.  The  scheme  R  =(city,  ZS),  ZS=(zip,  st),  would  have 
the  same  advantages  described  above,  with  blocks  now  corresponding  to  nested 
relations,  and  without  the  disadvantage  of  having  two  relations.  However,  we 
can  go  further.  Since,  st  — ♦  zip,  holds  in  each  nested  relation  each  value  of  st 
is  associated  with  exactly  one  value  of  zip.  Therefore,  we  can  nest  all  st  values 
for  a  particular  zip  value  into  a  nested  relation,  obtaining  the  scheme  R  =(city, 


employee 


| - 1 - 1  employee  — >— ►  name,  dob 

name,  dob  type  employee  — >— ♦  type,  year,  city 

year,  city  employee,  type  — *— +  year,  city 
Figure  3-2.  Scheme  tree  and  implied  MVDs  for  employee  database. 

ZS),  ZS=(zip,  ST*),  ST*=(st). 

3.4-2  Nested  Normal  Form 

Ozsoyoglu  and  Yuan  [OYl]  introduced  the  first  comprehensive  approach  to  nor¬ 
malization  for  -i INF  relations.  They  consider  nested  relations  whose  schemes 
are  structured  as  trees,  called  scheme  trees,  and  introduce  a  normal  form  for 
such  relations,  called  nested  normal  form  (NNF).  A  scheme  tree  is  a  tree  whose 
vertices  are  labeled  by  pairwise  disjoint  sets  of  zero  order  attributes,  where 
the  edges  of  the  tree  represent  MVDs  between  the  attributes  in  the  vertices 
of  the  tree.  These  MVDs  allow  a  INF  relation  to  be  represented  as  a  ->1NF 
relation  with  the  good  properties  discussed  in  section  3.3.  The  scheme  tree  and 
associated  MVDs  for  the  Emp  scheme  are  shown  in  Figure  3-2. 

Formally,  let  U  be  a  set  of  zero  order  attributes,  T  be  a  scheme  tree, 
and  e  =  ( u ,  v)  be  an  edge  of  T.  Let  A(v)  be  the  union  of  all  ancestors  of  v, 
including  v,  D(v)  be  the  union  of  all  descendants  of  v,  including  v ,  and  S(T) 
be  the  union  of  all  attributes  in  T.  Then  the  MVD  represented  by  the  edge  e  is 
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A(u)—*—>D(v)  in  the  context  of  S(T).  Also,  let  MVD(T )  be  the  set  of  MVD’s 
represented  by  the  edges  of  T. 

Definition  3.1:  [OYl]  Let  T  be  a  scheme  tree,  and  ui,U2, . . .  ,un  be  all  the  leaf 
nodes  of  T.  Then  the  path  set  of  T,  denoted  P(T),  is  {A(ui),  A(u2), . . . ,  A(un)}. 
Note  that,  for  a  leaf  node  u,  A(u)  is  the  union  of  all  the  nodes  in  the  path  from 
the  root  of  T  to  u  in  T. 


The  following  proposition  gives  some  properties  of  a  scheme  tree. 

Proposition  3.1:  [OYl]  If  T  is  a  scheme  tree,  then 

1.  P(T)  is  an  acyclic  database  scheme, 

2.  MVD(T)  (P(T)),  and 

3.  MVD(T)  is  conflict  free.  □ 

Let  T  be  a  scheme  tree  with  respect  to  Af,  where  S(T)  C  U,  and 
(u,v)  be  an  edge  in  T.  Assume  there  is  a  key  X  of  M  [OY2]  such  that  there 
exists  Z  €  DEP(X)  and  D(v)  =  Z  fl  S{T).  Then,  v  is  said  to  be  a  partial 
redundant  in  T  with  respect  to  X  if  X  C  A(u).  The  MVD,  X—*—*D[v)  in  the 
context  of  •S’(T')  is  a  partial  dependency  in  S(T).  Similarly,  if  there  exists  some 
sibling  nodes  Vi,V2,...,vn  of  v  in  T  such  that  W  =  U"=i  X  c  A(u)W, 

and  M  does  not  imply  XW— *— >D{v)  in  the  context  of  S(T),  then  v  is  said 
to  be  transitive  redundant  with  respect  to  X  in  T.  In  this  case,  the  MVD, 
X-*-+D{v),  in  the  context  of  S(T),  is  said  to  be  a  transitive  dependency  in 


In  order  to  avoid  dividing  keys  in  trees,  we  define  a  set  of  attributes 
called  a  fundamental  key.  Let  M  be  a  set  of  MVDs  on  U  and  VC  U .  The  set 
of  fundamental  keys  on  V,  denoted  FK(V),  is  defined  by: 

FK{V)  =  {V  n  X\X  £  LHS(M)  and  V  n  X  ±  0, 

and  there  is  no  Y  £  LHS(M)  such  that  X  nV  D  Y  D  V  /  0}. 

Given  a  set  M  of  MVDs  on  attributes  U,  [OYl]  gives  an  algorithm  to 
decompose  U  into  a  set  of  scheme  trees  which  do  not  have  partial  or  transitive 
redundancies  and  does  not  divide  keys  in  the  trees.  A  normal  scheme  tree  is 
defined  as  follows. 

Definition  3.2:  A  scheme  tree  T  is  said  to  be  normal  with  respect  +o  a  set  of 

MVDs,  Af,  if 

1.  M  implies  MVD{T ), 

2.  There  are  no  partial  dependencies  in  T. 

3.  There  are  no  transitive  dependencies  in  T. 

4.  The  root  of  T  is  a  key,  and  for  each  other  node  u  in  T,  if  FK{D(u))  ^ 
0,  then  u  £  FK{D(u)). 

The  method  proposed  in  [OYl]  uses  MVDs  and  the  MVD  counterpart 
of  FDs  (via  rule  FD-MVDl)  as  input  to  the  NNF  decomposition  algorithm. 
In  [YO],  the  authors  have  combined  FDs  and  MVDs  into  an  envelope  set  of 
dependencies.  They  propose  that  this  envelope  set  could  be  used  as  input  to 
a  slightly  modified  NNF  algorithm  which  would  then  take  into  account  the 
different  semantics  of  FDs  and  MVDs.  Using  the  algorithm  in  [OYl],  singleton 


sets  are  likely  to  appear  when  FDs  are  used  to  perform  the  decomposition. 
We  propose  a  new  method  for  achieving  nested  normal  form  which  takes  into 
account  the  different  semantics  of  FDs  in  Chapter  9. 

3.5  --INF  Applications 

In  this  section  we  sample  a  variety  of  applications  for  -<1NF  relations.  We 
will  describe  and,  where  space  permits,  show  an  example  of  ->1NF  relations  to 
model  office  forms,  complex  objects  and  CAD,  statistical  databases,  information 
retrieval  systems,  and  a  relational  operating  system  interface. 

3.5.1  Office  Forms 

Implementing  office  forms  in  a  database  system  are  discussed  in  [AH,  KTW, 
SLTC].  In  [AH],  the  format  model  is  used  as  a  foundation  for  studying  the  struc¬ 
ture  of  forms  as  they  arise  in  office  information  systems.  Form  systems  based 
on  -ilNF  relations  are  described  in  [KTW].  They  propose  a  design  method¬ 
ology  for  conceptual  modeling  of  -»1NF  relations,  especially  to  represent  the 
semantic  concepts  needed  for  form  systems,  and  give  an  overview  of  a  proto¬ 
type  implementation  of  a  form  system  at  the  University  of  Vienna.  A  formal 
means  for  specification  of  forms  processing  is  presented  in  [SLTC].  Figure  3-3 
shows  how  an  invoice  form  would  appear  as  a  --INF  relation.  Note  that  this 
is  a  user  view;  the  stored  data  would  not  include  amount  and  total  columns  as 
these  are  derived  from  the  other  data  in  the  relation. 
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Invoices 


cname 

cad  dress 

Orders 

total 

prod-no 

qty 

price 

amou  it 

Smith 

Chicago 

102 

10 

1.30 

13.00 

80.00 

210 

1 

67.00 

67.00 

West 

Auburn 

102 

5 

1.30 

6.50 

20.80 

213 

43 

0.10 

4.30 

456 

10 

1.00 

10.00 

Figure  3-3.  An  invoice  form  represented  as  a  ->1NF  relation. 

3.5.2  Complex  Objects  and  CAD 


Complex  objects  and  CAD  applications  are  obvious  candidates  for  the  -ilNF 
model.  Issues  involved  in  using  the  relational  model  for  these  applications  are 
discussed  in  [BaKh,  BaKi,  HL,  Lor,  ML].  Most  of  this  research  is  involved  in 
how  to  model  complex  objects  using  the  traditional  relational  model.  An  exam¬ 
ple  from  [Lor]  will  illustrate  how  we  can  use  -ilNF  relations  to  our  advantage 
in  this  environment.  Let  us  consider  the  design  of  electronic  components.  A 
particular  component  is  called  an  entity.  An  entity  can  comprise  several  other 
entities  at  a  different  level.  Consider,  for  example,  a  4- AND  entity  built  out  of 
three  elementary  2-AND  gates.  A  design  for  the  4- AND  entity  is  illustrated  in 
Figure  3-4. 

The  description,  both  topological  and  graphical,  can  be  mapped  into 
relations.  Figure  3-5  shows  the  contents  of  the  INF  relations  for  a  simple  de¬ 
sign.  The  relation  Entity  contains,  for  each  entity,  its  unique  identification 


t 


Figure  3-4.  Design  of  a  4-AND  component. 


number  and  its  name.  The  relations  Geometry  and  Pins  specify,  for  each  en¬ 
tity,  its  exterior  representation.  Geometry  specifies  the  lines  drawn  from  (xl, 
yl)  to  (x2,  y2)  while  the  relation  Pins  specifies  the  exterior  pins  of  the  entity: 
pin  number,  class  (input  or  output)  and  position.  The  internal  contents  of  an 
entity  are  specified  in  terms  of  other  entities  that  are  used  to  build  the  higher 
level  entity.  An  instance  of  an  entity  used  inside  another  entity  is  called  a 
block.  The  relation  Blocks  specifies  the  blocks  used  inside  an  entity:  the  num¬ 
ber  of  each  block,  the  type  (the  identifier  of  the  entity  of  which  this  block  is  an 
instance),  and  some  graphical  information  such  as  position  and  scale.  The  rela- 
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Figure  3-6.  ->1NF  relation  for  circuit  design, 
tion  Connections  shows  the  topology  of  the  connections  inside  an  entity.  Each 
row  gives  the  connection  number  and  the  two  block/pins  that  are  connected. 
Graphically,  a  connection  is  represented  as  a  line  built  out  of  one  or  several  seg¬ 
ments.  A  relation  Conx-segments  contains  a  row  for  every  intermediate  point 
in  a  connection  between  two  block/pins;  if  a  connection  is  made  out  of  a  single 
line  segment  there  is  no  corresponding  row  in  Conx-segments. 


Six  relations  are  needed  to  represent  the  circuit  design  database,  even 
though  there  is  only  a  single  object  being  modeled.  A  ->1NF  design  for  this 
database  uses  only  one  relation,  as  in  Figure  3-6.  Each  tuple  of  this  relation 
contains  all  of  the  data  on  each  entity:  its  id,  name,  geometry,  pins,  blocks, 
and  connections.  The  user  can  more  easily  see  the  entire  design  of  an  entity, 
and  queries  will  be  easier  to  formulate,  since  only  one  relation  need  be  queried. 


53 


SUM-SALARY-OF 

EMPLOYEES 

DIV: 

divl 

div2 

man 

DEPT: 

personnel 

DEPT: 

acct 

AGE: 

SUM-SAL: 

[18,30] 

100K 

150K 

290K 

[31,40] 

200K 

300K 

400K 

[41,60] 

250K 

350K 

250K 

Figure  3-7.  An  example  summary  table:  SUM-SALARY-OF-EMPLOYEES 

3.5.3  Statistical  Databases 


Statistical  databases  are  a  natural  candidate  for  ->1NF  relations  since  grouping 
of  data  is  accomplished  so  that  statistics  can  be  applied  to  them.  Modeling  of 
statistical  database  applications  was  done  by  [Joh,  002].  A  query  language 
and  physical  organization  techniques  for  a  statistical  database  are  described  in 
[OOMl,  OOl,  003].  In  Figure  3-7,  we  show  an  example  of  a  “summary  table” 
from  [003].  This  table  shows,  for  each  age-group,  the  sum  of  the  salaries  of 
employees  in  each  department  of  each  division  of  some  company.  This  same 
data  can  be  represented  as  a  ->1NF  relation  as  shown  in  Figure  3-8. 

3.5.4  Relational  Operating  System  Interface 

Korth  and  Silberschatz  (Kor,  KS]  propose  extending  the  relational  model  to 
support  an  operating  system  interface.  The  ability  to  use  a  -’INF  model  greatly 
enhances  this  idea.  For  example,  one  function  of  an  operating  system  is  to  allow 
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Figure  3-8.  Summary  table  as  a  -ilNF  relation. 

users  to  communicate  with  each  other  by  exchanging  messages.  Such  a  mail 
system  could  be  represented  by  the  two  relations  in-mail  on  scheme  (sender, 
cc-list,  subject,  date-received,  text)  and  out-mail  on  scheme  (to,  cclist,  subject, 
date-sent,  text).  Mail  is  read  by  querying  the  in-mail  relation,  and  mail  is  sent 
by  adding  a  tuple  to  the  out-mail  relation.  The  attributes  cc-list  and  text  are 
set-valued  attributes.  The  cc-list  attribute  contains  all  addresees  which  will 
get  a  copy  of  the  message,  and  text  can  be  broken  down  into  lines  or  words.  If 
a  INF  view  of  this  relation  were  needed,  each  addressee  and  each  line  of  text 
would  force  another  tuple  to  be  added  to  the  mail  relations.  To  overcome  the 
additional  redundancy  this  causes,  we  would  have  to  decompose  the  relations 
in  the  database,  causing  the  user’s  view  of  mail  to  become  fragmented  and 
complicating  the  use  of  the  mail  system. 


Books 


Authors 

Title 

Price 

Descriptors 

Al,  A2 

T1 

PI 

Dl,  D2 

A2 

Tl 

P2 

Dl,  D2 

Al 

Tl 

PI 

Dl,  D2,  D3 

Figure  3-9.  Books  table  as  a  ->1NF  relation. 

3.5.5  Information  Retrieval  Systems 

There  is  a  trend  towards  integrating  database  management  systems  and  in¬ 
formation  retrieval  systems.  [GP,  Mac,  PS,  Schl,  SP]  describe  methods  for 
enhancing  relational  database  systems  to  support  the  information  retrieval  ap¬ 
plication.  Most  of  this  work  is  concerned  with  textual  data,  however,  pictorial 
and  graphical  data  have  some  similar  support  requirements.  The  Advanced  In¬ 
formation  Management  (AIM)  project  has  been  running  at  the  IBM  Heidelberg 
Scientific  Center  since  1978.  This  project  is  testing  the  feasibility  of  integrating 
the  management  of  formatted  and  unformatted  data  into  a  -'INF  relational 
database.  Figure  3-9  shows  a  -'INF  book  inventory  table  in  the  style  of  [SP], 
while  the  normalized  INF  version  of  this  table,  requiring  three  relations,  is 
shown  in  Figure  3-10.  There  are  also  many  common  information  retrieval  re¬ 
quests  which  are  hard  to  formulate  on  the  basis  of  the  data  structure  in  Figure 
3-10,  such  as 

Display  title  and  price  of  books  described  by  both  descriptors 

Dl  and  D2  and  written  by  author  Al. 


Figure  3-lla  gives  an  SQL-like  [C+]  formulation  of  this  query.  Intu- 


Book  Author  Descriptor 


Figure  3-10.  INF  relations  corresponding  to  Books  table  of  Figure  3-9. 

SELECT  Title,  Price  SELECT  Title,  Price 

FROM  Book,  Author,  FROM  Books 

Descriptor  X,  Descriptor  Y  WHERE  Descriptors  D  {Dl,  D2} 
WHERE  Author  =  A2  AND  Authors  D  {A2} 

AND  Author. Bno  =  Book.Bno 

AND  Book.Bno  =  X.Bno 

AND  X. Descriptor  =  Dl 

AND  Book.Bno  =  Y.Bno 

AND  Y. Descriptor  =  D2 

(a)  (b) 

Figure  3-11.  Formulation  of  query  in  (a)  SQL  referring  to  Figure  3-10,  and 

(b)  extended  SQL  referring  to  Figure  3-9. 


itively,  a  simpler  formulation  should  be  possible,  as  indicated  in  Figure  3-llb. 
In  Chapter  8,  we  present  an  SQL-like  extension  for  ->1NF  databases  which 
makes  possible  queries  like  the  one  shown  in  Figure  3-llb. 


Chapter  4 

Formal  Query  Languages 


In  this  chapter,  we  provide  formal  definitions  for  a  tuple  relational  calculus 
and  a  relational  algebra  extended  for  the  ->1NF  model.  The  proof  that  these 
two  formal  languages  are  equivalent  will  be  given  in  Chapter  6  after  we  have 
introduced  some  extended  algebra  operators  in  Chapter  5.  These  extended 
operators  can  be  expressed  in  terms  of  the  basic  algebra  operators  and  will 
simplify  the  proof  development.  Note  that  in  this  chapter  we  do  not  allow  null 
values  or  empty  nested  relations.  See  Chapter  7  for  a  thorough  treatment  of 
null  values. 

4.1  Extended  Relational  Calculus 

Using  the  notation  from  Chapter  2,  we  define  a  tuple  relational  calculus  (TRC) 
with  expressions  of  the  form  0  |  where  t  is  a  tuple  variable  of  fixed  length 

and  rp  is  a  formula  built  from  atoms  and  a  collection  of  operators  defined  below. 

The  atoms  of  formulas  ip  are  of  four  types. 

1.  s  6  r,  where  s  is  a  tuple  variable,  and  r  is  a  relation  name.  This 
specifies  that  s  is  a  tuple  in  relation  r,  or  s  is  an  element  of  r.  The 
arity  of  8  is  equal  to  the  degree  of  r. 

2.  s  €  <[t]  where  t  and  s  are  tuple  variables.  This  specifies  that  s  is  a 


tuple  in  the  relation  specified  by  the  tth  component  of  t,  whose  value 
must  be  a  set-of-tuples.  The  arity  of  s  is  the  arity  of  the  tuples  in  the 
set. 

3.  a  0  s[t],  s[t]  0  a,  s[t]  6  f[j],  where  s  and  t  are  tuple  variables,  a 
is  a  constant,  and  6  is  an  arithmetic  comparison  operator  (=,>). 
Note  that  constants  may  be  simple  values  or  non-empty  sets-of- values, 
however  equality  is  the  only  operator  which  can  compare  non-simple 
values.  Although  other  comparison  operators,  such  as  <,  D,  c,  etc., 
are  legitimate  operators  and  could  be  included  in  the  calculus,  for 
simplicity  we  use  only  =  and  >.  Expressions  using  these  additional 
operators  can  be  expressed  with  calculus  expressions  which  do  not  use 
them. 

4.  *[*']  =  {u|0'(u,t1,t2>.  ..,tfc)},  where  xp'  is  a  formula  with  free  tuple 

variables  u,tx,t2,  s  is  some  t,-.  This  specifies  that  the  tth  at¬ 

tribute  of  s  is  the  set  of  u  tuples  such  that  rp'  holds.  Note,  if  no  tuples 
u  satisfy  tjj'  then  this  atom  evaluates  to  false.  This  is  to  comply  with 
our  requirement  that  no  null  values  appear  in  instances. 

Chapter  2,  formulas  are  defined  with  the  operators  (->,  A,  V,  V,  3). 

To  illustrate  these  concepts,  let  us  consider  a  number  of  examples. 

1.  Given  a  INF  relation  r  on  scheme  R  =  ( A,B ),  the  TRC  expression 
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which  nests  r  on  the  B  attribute  producing  a  relation  with  scheme 
R'  =  (. A,B'),B '  =  {B)  is: 

{tW\{3s){serAt[l]=s[l} 

A  t[2]={u^|(3v)(t>  G  r  A  s[l]=t?[l]  A  u[l]=v[2])})} 

2.  Given  a  nested  relation  r  with  scheme  R  =  ( A , B'),B'  =  (#),  the  INF 
relation  with  scheme  R'  =  (A,  B)  is: 

{t^|(3s)(s£r  A  t[l]=s[l]  A  (3u)(u  £  s[2]  A  t[2]=u[l]))} 

3.  Given  a  nested  relation  r  with  scheme  R  =  (A, B,E'),  B  =  (C, D'), 
D'  =  (Z?),  E'  =  (E),  the  set  of  all  tuples  in  r  with  a  C  value  of  ‘c’  and 
within  that  B  tuple  a  D  value  of  ‘d’,  is: 

{t\t  G  r  A  (3s)(s  E  t[2]  A  s[l]=‘c’  A  (3u)(u  (E  s[2]  A  u[l]=‘d’))} 

4.  Given  a  nested  relation  as  in  example  3,  the  set  of  all  tuples  in  r, 
removing  all  B  tuples  from  each  B  subrelation  that  do  not  have  any 
D  values  greater  than  6,  and  in  those  that  do,  eliminating  all  D  values 
<  6,  is: 

{tW\{3s)(ser  A  t[l]=s[l]  A  t[3]=s[3]  A  t[2]={u^|(3t/)(v  £  s[2] 

A  u[l]=v[l]  A  u[2]={ty^^|u;  €  v\2]  A  tu[l]  >  6})})} 

Figure  4-1  shows  a  sample  relation  r  and  the  result  of  this  query. 


Figure  4-1.  Relation  r  and  result  of  calculus  query  4. 


As  we  pointed  out  in  Chapter  2,  the  TRC  allows  us  to  define  some 
infinite  relations  such  as  (t  |  ->(t  €  r)},  which  denotes  all  possible  tuples  that 
are  not  in  r,  but  are  of  the  arity  we  associate  with  t.  These  types  of  expressions 
have  not  been  eliminated  in  our  present  calculus  and  can  even  occur  in  nested 
expressions. 


To  overcome  this  problem,  the  notion  of  safety  must  be  extended 
to  the  ->1NF  calculus.  Safe  expressions  are  those  expressions  for  which  the 
answer  can  be  computed  in  finite  time  by  examining  only  the  relations  and 
constants  mentioned  in  the  expression.  As  for  the  INF  calculus  we  denote  the 
set  of  symbols  that  appear  in  relations  or  constants  mentioned  in  expression 
ip  as  DOM  (ip).  However,  in  the  -<1NF  calculus  the  symbols  may  appear  also 
in  nested  relations.  An  expression  ip  is  safe  if  each  component  of  any  t  that 
satisfies  ip  must  be  a  member  of  or,  recursively,  a  relation  on  DOM  (ip).  This 
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statement  replaces  the  first  constraint  listed  under  safety  in  Chapter  2.  The 
second  and  third  constraints  are  similarly  modified  so  that  the  components  of 
a  tuple  variable  are  recursively  accessed,  allowing  the  components  of  nested 
relations  to  be  tested. 

We  add  also  a  fourth  constraint  to  the  definition  of  safe  expressions 
to  eliminate  the  uncontrolled  creation  of  powersets.  This  new  constraint  is 
necessary  because  we  have  introduced  the  new  atom,  s  £  f[t].  This  atom  states 
that  tuple  variable  s  must  assume  values  which  are  elements  of  the  »th  attribute 
of  tuple  variable  t.  Thus,  £[*],  if  not  further  constrained  in  the  expression,  can 
assume  any  set  of  values  as  long  as  a  value  for  a  is  a  member  of  that  set.  The 
first  three  safety  constraints  have  only  the  capability  of  limiting  the  values  for 
these  sets  to  those  in  DOMty),  the  worst  case  being  the  powerset  of  DOM(rp). 
Our  fourth  constraint  is  as  follows: 

4.  If  an  atom  of  the  form  s  £  f[t]  appears  in  an  expression  then  one  of 
the  following  cases  holds: 

a.  Tuple  variable  t  appears  in  an  atom  of  type  t  £  r. 

b.  Tuple  variable  t  appears  in  an  atom  of  type  t  £  «[/]. 

c.  The  tth  component  of  t  appears  in  an  atom  of  type  t[t]  =  u\j ] 
or  u[j)  =  t[»]  and  if  u  appears  in  an  atom  of  the  form  q  £  u[j] 
then  safety  constraint  4  is  satisfied  for  u  without  considering 
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the  atoms  involving  £[»]  which  invoked  this  case. 

d.  The  *th  component  of  t  appears  in  an  atom  of  type  £[*]  = 

{«#'(«)}. 

With  this  modification  of  DOM{rl>)  and  the  addition  of  constraint  4,  and  the 
proviso  that  each  calculus  expression,  nested  or  otherwise,  must  be  safe,  our 
definition  of  safety  for  the  -ilNF  calculus  is  complete. 

4.2  Extended  Relational  Algebra 

In  order  to  have  the  same  power  as  the  safe  relational  calculus,  we  need  to 
add  only  two  new  operators  to  the  basic  set  of  union,  set  difference,  cartesian 
product,  projection,  and  selection.  These  are  the  nest  ( u )  and  unnest  (/x) 
operators  as  defined  in  [JS,  FT].  The  basic  set  of  operators  work  exactly  as 
before  except  the  domains  may  now  be  either  atomic  or  set-valued. 

1.  Nest  takes  a  relation  structure  Z  =  ( R,r )  and  aggregates  over  equal 
data  values  in  some  subset  of  the  names  in  R.  Formally,  let  R  be 
a  relation  scheme,  in  database  scheme  5,  which  contains  a  rule  R  = 
( Ai,A3,...,An )  for  external  name  R.  Let  {2?i,B2,...,I?m}  C  Er 
and  {Ci,C2,...,C*}  =  Er  —  {Bi,52,...,Bm}.  Assume  that  either 
the  rule  B  =  (Bi,  52, . . . ,  Bm)  is  in  S  or  that  B  does  not  appear  on  the 
left  hand  side  of  any  rule  in  S  and  (Bi,  B2, . . . ,  Bm)  does  not  appear 


on  the  right  hand  side  of  any  rule  in  S.  Then  VB=(BltB2 . Bm)(%)  = 

( R',r ')  =  Z'  where: 

!•  R  —  (Ci,C2, Ck,  ( Bi ,  B2, Bm))  =  (CltC2, Ck,  B) 
and  the  rule  B  =  [Bi,B2,...,  Bm )  is  appended  to  the  set  of 
rules  in  S  if  it  is  not  already  in  S,  and 

2.  r'  =  {t  |  there  exists  a  tuple  tigr  such  that 

t[CiC2  •  •  ■  Ck]  =  u\CxC2  •  •  •  Ck\  A  t[B ]  =  {v[BiB2  •  •  •  Bm]  | 
tier  A  ufCiCj  •  •  •  Cfc]  =  t[C2C2 .  •  •  Ct]}}. 

2.  Unnest  takes  a  relation  structure  nested  on  some  set  of  attributes  and 
disaggregates  the  structure  to  make  it  a  “flatter”  structure.  Formally, 
let  R  be  a  relation  scheme,  in  database  scheme  5,  which  contains  a  rule 
R  =  (A\,A2, . . .  ,j4„)  for  external  name  R.  Assume  B  is  some  higher 
order  name  in  Er  with  an  associated  rule  B  =  (J?i, B2,..., Bm).  Let 

{Ci,C2,...,Ck}  =  Er  -  B.  Then  Hb=(Bi,b2 . Bm)[%)  =  (R\r')  = 

where: 

1.  R!  =  (Ci,C2, . . .  ,Ck, Bi,  B2, . . . ,  Bm)  and  the  rule 

B  =  {Bi,B2,..  .,Bm)  is  removed  from  the  set  of  rules  in  S 
if  it  does  not  appear  in  any  other  relation  scheme,  and 


2.  r'  =  {t  |  there  exists  a  tuple  u  G  r  such  that 


Note  that  unnesting  an  empty  set  produces  no  tuples;  however,  since 
we  do  not  allow  empty  nested  relations  and  since  the  other  algebra 
operators,  in  particular  the  nest  operator,  cannot  produce  them,  there 
should  be  no  need  to  apply  unnest  to  an  empty  set. 

We  can  apply  unnest  to  a  relation  as  long  as  it  still  contains  nested  relations. 
Thomas  and  Fischer  [TF]  showed  that  the  order  of  unnesting  does  not  affect 
the  content  of  the  resulting  INF  relation.  They  defined  the  UNNEST*  operator 
to  transform  any  ->1NF  relation  to  a  INF  one.  We  will  use  n*  to  indicate  this 
operation. 

We  often  omit  the  right  hand  side  of  rules  in  unnest  operations  since 
the  rule  name  is  adequate.  In  a  similar  manner,  when  writing  a  nest  operation 
we  may  choose  not  to  specify  the  name  of  the  rule  to  be  added  to  S,  only  the 
name  of  the  attributes  to  be  nested.  When  this  is  done,  we  assume  that  a 
unique  rule  name  is  generated  if  the  names  being  nested  do  not  already  appear 
on  the  right  hand  side  of  any  rule  in  S. 

Let  us  consider  a  number  of  examples  to  illustrate  these  concepts. 

1.  Given  the  relation  r  on  scheme  R  =  {A,C,  D,  E),  the  relation  with 
the  C  and  D  attributes  nested  together,  and  renamed  B ,  is: 

^B=(c,D){r) 

This  produces  the  scheme  R'  =  ( A,B,E),B  =  (C,D). 
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J/B'={B)(l/B  =  (C,Z))(r)))  t/B=(C,D)(*/B'=(£)(r))) 


Figure  4-2.  Relation  r  and  result  of  algebra  query  2. 

2.  Using  the  same  relation  r,  the  relation  with  scheme  R!  =  (A,  B,  E'), 
B  =  (C,D),  E'  =  (£)  is: 

*/B=(C,£»)(t/E'=(E)(»'))  or  ^JS'=(£)(^B=(C,D)(>')) 

Although  both  of  these  expressions  produce  the  desired  scheme,  the 
relations  may  be  radically  different  (see  Figure  4-2). 

3.  The  relation  on  scheme  R!  =  (A,B,E),B  =  { C,D'),D '  =  ( D )  pro¬ 
duced  from  r  is: 

*/B=(C,D')(*/Z>,=(D)(r)) 

In  this  case  only  one  order  is  possible  since  D  must  be  nested  before 
D'  can  be  further  nested  as  part  of  B. 

4.  Given  the  relation  s  on  S  =  ( A,B,E'),B  =  ( C,D),E '  =  ( E ),  the 
relation  with  attribute  E'  unnested,  is: 

Me'(s) 

5.  Given  relation  3  on  5  as  in  4,  the  relation  with  attribute  B  unnested, 


4  ‘■V 
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giving  the  scheme  S'  =  (AtC,D,  £'),£'  =  ( E ),  is: 


Mb(s) 


6.  Given  relation  s  on  S  as  in  4,  the  relation  with  each  of  the  D'  sets 
within  each  B  subrelation  unnested,  producing  the  relation  with 


scheme  S'  =  (.4, £,£'),£  =  (C, £>),£'  =  (£),  is: 


Vb={C,D)  {HD‘  (pb(" s)  )  ) 


I 


Chapter  5 

Partitioned  Normal  Form  and 
Extended  Algebra  Operators 


l 

» 

i 
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In  this  chapter,  we  consider  a  restriction  of  ->1NF  relations  to  those  that  are 
in  partitioned  normal  form.  We  then  define  a  set  of  extended  algebra  opera¬ 
tors  under  which  the  class  of  partitioned  normal  form  relations  is  closed.  These 
extended  operators  are  designed  to  be  reasonable  extensions  to  their  INF  coun¬ 
terparts,  making  use  of  the  implied  multivalued  dependencies  which  exist  when 
relations  are  in  partitioned  normal  form. 

5.1  Restricting  the  Class  of  -<1NF  Relations 

Consider  the  relation  scheme 

Student  =  (sname,  Course) 

Course  =  (cname,  grade) 

In  Figure  5-1  we  have  two  instances  of  Student,  Si  and  S2,  where  Si  contains 
previous  work  of  two  students  and  S2  contains  some  new  data  on  these  students. 

A  natural  step  would  be  to  add  the  new  information  in  S2  to  that  in 
Si.  If  we  apply  the  union  operator  then  we  get  the  relation  in  Figure  5-2. 

Although  all  of  the  information  is  certainly  represented  in  this  relation 
it  lacks  the  intuitive  appeal  of  the  relation  in  Figure  5-3  in  which  the  Course 
sets  are  combined  for  each  unique  value  of  Student.  One  alternative  is  to  use 
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Si 


sname 

Course 

cname 

grade 

Jones 

Math 

A 

Science 

B 

Smith 

Math 

A 

Physics 

C 

Science 

A 

Figure  5-1.  Two 


S2 


sname 

Course 

cname 

grade 

Jones 

Physics 

B 

Smith 

Chemistry 

A 

English 

B 

Student  instances. 


St  U  S2 


sname 

Course 

cname 

grade 

Jones 

Math 

A 

Science 

B 

Jones 

Physics 

B 

Smith 

Math 

A 

Physics 

C 

Science 

A 

Smith 

Chemistry 

A 

English 

B 

Figure  5-2.  Union  of  instances  in  Figure  5-1. 
an  unnest  operation  followed  by  the  corresponding  nest  operation  after  taking 
the  union.  So  the  query  would  be 


^Course  (MC  ourse  {Si  u  s,)) 


This  takes  advantage  of  the  property  that,  in  general,  nest  is  not 
always  an  inverse  operator  for  unnest.  This  property  is  intuitively  unappealing 
and  impedes  query  optimization. 
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sname 

Course 

cname 

grade 

Jones 

Math 

A 

Science 

B 

Physics 

B 

Smith 

Math 

A 

Physics 

C 

Science 

A 

Chemistry 

A 

English 

B 

Figure  5-3.  Better  representation  of  Figure  5-2. 

We,  therefore,  define  a  class  of  ->1NF  relations  for  which  there  is  always 
a  sequence  of  nest  operations  which  will  be  an  inverse  for  any  sequence  of  valid 
unnest  operations.  In  the  next  section,  we  extend  the  meaning  of  our  relational 
algebra  operators  to  work  within  this  domain. 

Definition  5.1:  Let  R  =  (R,r)  be  a  relation  structure  with  attribute  set 
Er  containing  zero  order  attributes  A\,A2, ...  ,A*  and  higher  order  attributes 
X\,  X2, . . . ,  Xi.  R  is  in  partitioned  normal  form  (PNF)  if  and  only  if  the  fol¬ 
lowing  two  conditions  hold: 

(a)  A\A2  *  •  •  At  — ►  Er,  and 

(b)  For  all  t  €  r  and  for  all  Xi  :  1  <  *  <  l  :  Rt ,•  is  in  PNF,  where 
Rti  =  <X.-,t[X,]>. 

Note,  if  k  =■  0  then  0  -+  Er  must  hold  and  if  t  =  0  then  A\A2  ■••Aie-+ 
AiA2  •  •  •  Ak  holds  trivially.  Thus  a  INF  relation  is  in  PNF. 
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PNF  is  a  desirable  goal  for  the  representation  of  relationships  in  ->1NF 
relations.  This  stems  from  our  belief  that  a  particular  nesting  scheme  should 
not  be  used  unless  the  FDs  which  enforce  PNF  hold  in  the  relation.  We  will 
discuss  further  normalization  for  -<1NF  relations  in  Chapter  9. 

We  would  like  to  ensure  that  given  a  relation  in  PNF  when  we  apply 
a  nest  or  an  unnest  operator  then  we  get  a  PNF  relation  in  return.  In  general 
this  is  true  only  for  the  unnest  operator.  The  nest  operator  returns  a  PNF 
relation  if  and  only  if  certain  functional  dependencies  hold  in  the  relation  and 
each  nested  relation. 

Theorem  5-1.  The  class  of  PNF  relations  is  closed  under  unnesting. 

Proof:  Let  R  be  any  relation  structure  R  =  {R,r)  with  attribute  set  ER 
containing  higher  order  attribute  B  with  scheme  B  =  (BiyB2,  . . .  ,Bq).  We 
show  that  R'  =  (J’B=(b1iBi,...,b,)Z  is  a  PNF  relation. 

Since  R  is  in  PNF  we  know  that  AiA2 •  •  •  An  — ►  Er  where  the  -4,-, 
1  <  t  <  n,  are  the  zero  order  attributes  in  Er.  We  also  have  that  in  each 
nested  relation  B ,  BiB2  •  •  •  Bi  — ♦  Er  where  the  5,-,  1  <  *  <  £,  are  the  zero 
order  attributes  in  Er- 

The  attributes  of  R'  are,  by  definition  of  unnest,  the  attributes  (Er  — 
B)  U  ( B\B2  •  •  •  Bn).  These  attributes  can  be  partitioned  into  four  sets,  the  zero 
order  attributes  of  Er  (i4jA2  •  •  •  An)>  the  higher  order  attributes  in  Er  —  B 
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(X1X2  •  •  •  Xm) ,  the  zero  order  attributes  of  Eb  {BiB2  ••  'Be),  and  the  higher 
order  attributes  of  Eb  (YiY2  •  •  •  Yp).  Our  task  then  is  to  show  that  for  any 
tuples  ti  and  t2,  if  t1f.A1.A2  •  *  •  AnBiB2  •  •  •  Bf\  =  t2[AiA2  •  •  •  AnBiB2  •  •  ■  Bt\  then 

h\xxx2  ■  •  •  xmY{Y2  ■■■yp)  =  t2[x2x2  •  •  •  xmYtY2  •  •  ■  yp\. 

Since  AiA2  •  ■  ■  An  — *■  X2X2  •  •  •  Xm  in  k,  and  unnesting  only  duplicates 
these  values,  we  have  that  ti[XiX2  •  •  •  Xm ]  =  t2[X iX2  •  •  •  Xm].  Since  ti  and  t2 
agree  on  A\A2  •  •  •  An,  they  came  from  the  same  tuple  of  r,  and  in  this  tuple 
BiB2  •••  Be  — ►  yiy2  •  •  •  Yp.  So  in  the  set  of  tuples  obtained  after  unnesting  the 
same  FD  applies  and  since  <1  agrees  with  t2  on  B\B2  •  •  •  Be,  ti[Y\Y2-  •  •Yp\  = 

U[YiYt  —  Yp).  □ 

Theorem  5-2.  The  nesting  of  a  PNF  relation  is  in  PNF  iff  in  the  PNF  rela¬ 
tion  k  =  (R,  r),  AiA2  •  •  •  Ak  — ►  X\X2  •  •  •  Xt,  where  A2,  A2,  ,..,At  are  the  zero 
order  attributes  in  Er  not  being  nested  and  Xi,X2,...,Xe  are  the  higher  order 
attributes  in  Er  not  being  nested. 

Proof:  We  show  that  k'  =  uXo =(A*+1>A*+a . An,xt+ll xe+3 . xm)W  is  in  PNF  if 

and  only  if  .A1.A2  •  •  •  A*  — ►  X\X2  •  •  •  Xt,  where  Aif  A2, . . . ,  An  are  the  zero  order 
attributes  in  Er  and  X\,  X2, . . . ,  Xm  are  the  higher  order  attributes  in  Er. 

if :  We  prove  that,  if  AiA2  •  •  •  A*  — +  XiX2  •  •  •  Xt  then  k1  is  in  PNF.  We  utilize 
a  case  analysis  on  the  values  on  m ,  n,  k,  and  l.  Note  that  either  k  <  n  or 
l  <  m  if  we  are  nesting  something. 
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Case  1:  m  =  0,  n  >  0.  Then  we  have  a  INF  relation  and  by  definition  of  nest 
the  relation  is  partitioned  by  the  nonnested  attributes  A\A2  •  •  •  Ak.  So 
A1A2  •  ■  •  Ak  — *  Xo  in  and  thus  %!  is  in  PNF. 

Case  2:  m  >  0,  n  =  0.  Then  there  is  one  tuple  in  the  relation  as  the  FD 
0  — *  X\X2  ■  •  •  Xm  holds.  Nesting  cannot  produce  fewer  tuples  and  any 
nested  relation  created  can  only  have  one  tuple  so  the  new  relation  is 
in  PNF. 

Case  3:  m  >  0,  n  >  0,  k  <  n,  t  =  m.  Then  we  are  nesting  only  zero  order 
attributes.  So  A1A2  •  •  •  Ak  — v  X1X2  •  •  •  Xm.  Then  in  each  partition  on 
AiA2  ■  Ak  the  X1X2  •  ■  •  Xm  values  will  be  the  same  so  a  partition  on 
AjA2  ■  •  ■  AkX iX2  •  •  •  Xm,  used  by  the  nest,  will  be  isomorphic  to  a  par¬ 
tition  on  AkA2  •  •  •  The  nest  will  form  a  set  X0  of  Ak+iAk+2  •  •  • An 
values  in  each  partition  and  the  FD  AiA2  •  •  •  AkX iX2  •  •  •  Xm  — ►  X0  will 
hold.  So  A1A2  "”Ak  —>■  XoXiX2  •  •  •  Xm,  giving  a  relation  in  PNF. 

Case  4:  m  >  0,  n  >  0,  k  =  n,  £  <  m.  Then  we  are  nesting  only  higher  or¬ 
der  attributes.  So  A\A2  •••  An  —>  X1X2  •  •  •  Xo  Nesting  will  be  done 
by  grouping  Xe+l Xt+2  •  •  •  Xm  in  each  tuple,  since  AjA2  ■  •  •  An  will  con¬ 
tinue  to  form  a  tuple-wise  partition.  So  AiA2  ••  •  An  —>  X0X1X2  •  •  •  Xo 
giving  a  relation  in  PNF. 

Case  5:  m  >  0,  n  >  0,  k  <  n,  l  <  m.  Then  we  are  nesting  some  zero  order  and 
some  higher  order  attributes.  So  AiA2  •  •  •  Ak  -*  X1X2  •  •  •  Xt.  Then 


during  nesting  a  partition  on  AXA2  ■  •  •  AkXxX2  •■■Xt  will  be  created 
and  by  definition  each  set  X0  of  Ak+iAk+2  •  •  •  AnXt+iXt+i  •  •  •  Xm  val¬ 
ues  will  be  uniquely  determined  by  AXA2  •  •  •  AkXxX2  •  •  ■  Xt.  Thus, 
A\A2  •  •  •  Ak  — ►  XoXiX2  ■  •  •  Xt.  In  each  new  nested  relation  the 
-dfc+i-^t+2  ■  •  ■  An  values  are  unique  since  AkA2  •  •  •  Ak  was  the  same  for 
each  of  these  tuples  and  AXA2  •  •  •  An  values  were  unique  as  %  is  in 
PNF.  Thus  Ajt+iAfc+2  •  •  ■  An  — ►  Xt+iXt+2  •  •  •  Xm  in  each  nested  rela¬ 
tion.  Thus  the  relation  is  in  PNF. 

only  if:  We  prove  if  JV  is  in  PNF  then  AxA2  •  •  •  Ak  — *•  X2X2  ■■■Xt  . 

Since  A\  A2  •  •  •  A*  are  the  zero  order  attributes  of  by  definition  of  PNF 
AiA2  •••At  -*  Er>  holds  in  IV.  By  the  projectivity  FD  axiom,  AjA2  •••  Ak  —* 
XxX2  ••  Xt . 

Therefore,  JV  is  in  PNF  iff  AXA2  •••  Ak  — >  XxX2  ■  •  ■  Xt-  □ 

5.2  Extending  the  Basic  Relational  Algebra  Operators 

As  the  example  in  section  5-1  showed,  we  need  to  extend  our  basic  algebra  oper¬ 
ators  to  work  within  the  class  of  PNF  relations.  We  first  extend  the  traditional 
set  operators — union,  intersection,  difference,  and  cartesian  product,  and  then 
extend  natural  join  and  projection.  Some  of  these  operators  are  similar  to  the 
extended  operators  of  (AB2|.  However,  our  definitions  arose  out  of  the  PNF 
requirement  and  since  our  model  does  not  include  null  values  or  empty  sets,  the 
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operations  are  well  defined.  In  [AB2],  empty  sets  are  allowed  but  null  values  are 
not,  so  there  are  problems  when  tuples  with  empty  sets  are  unnested.  Unlike 
[AB2],  we  do  not  extend  selection  in  this  dissertation.  We  note  also  that  the 
extended  operators  can  be  applied  to  ncn-PNF  relations  in  a  well  defined  way, 
however,  the  result  is  not  necessarily  a  PNF  relation. 

We  find  that  there  is  not  much  correspondence  between  the  way  most 
of  the  relational  algebra  operators  work  on  INF  relations  and  their  counterpart 
-«1NF  relations. 

Example  5.1:  Consider  ->1NF  relations  rx  and  r2  of  Figure  5-4  and  their  INF 
counterparts,  Sj  and  s2.  Note,  however,  that  r1flr2  is  not  the  -«1NF  counterpart 
of  n  s2,  as  the  usual  definition  of  intersection  requires  that  a  tuple  is  in  the 
result  only  if  that  tuple  is  in  both  input  relations.  □ 

We  believe  that  each  INF  operator  should  have  a  reasonable  ->1NF 
counterpart.  Intuitively,  a  -ilNF  operator  is  reasonable  if  it  behaves  identically 
to  the  corresponding  INF  operator  on  INF  relations  and  if  it  produces  a  result 
which  would  have  been  produced  had  the  equivalent  set  of  INF  relations  been 
used  instead  of  ->1NF  relations.  We  now  formally  define  reasonable  in  terms  of 
faithfulness  and  precision. 

Let  Rel  be  the  set  of  all  INF  relations  and  let  Rel*  be  the  set  of  all 


-ilNF  relations  that  have  at  least  one  higher  order  attribute  in  the  scheme. 
Thus,  Rel  fl  Rel*  =  0. 
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b' 

a' 

b 

Figure  5-4.  Intersection  applied  to  ->1NF  and  INF  relations. 
Definition  5.2:  Let  7  be  an  operator  on  Rel  and  let  7'  be  an  operator  on 
Rel*  U  Rel.  We  say  that  V  is  faithful  to  7  if  one  of  the  following  two  conditions 
holds: 


1.  when  7  and  7'  are  unary  operators,  7(r)  =  7f(r)  for  every  r  €  Rel  for 
which  7(r)  is  defined. 

2.  when  7  and  7'  are  binary  operators,  r  7  q  =  r  7'  q  for  every  r,  q  €E  Rel 
for  which  r  7  q  is  defined. 

Definition  5.3:  Let  7  be  an  operator  on  Rel  and  let  7'  be  an  operator  on 
Rel*.  We  say  that  7'  is  a  precise  generalization  of  7  relative  to  unnesting  if  one 
of  the  following  two  conditions  holds: 

1.  when  7  and  7'  are  unary  operators,  M*(V(r))  =  7(/**(r))  f°r  every 
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r  £  Rel*  for  which  7f(r)  is  defined. 

2.  when  7  and  7'  are  binary  operators,  n*[r  7'  q )  =  n*(r)  7  for 
every  r,q  £  Rel*  for  which  r  7'  g  is  defined. 

We  now  define  -ilNF  operators  which  are  faithful  and  precise  and  also 
have  some  intuition  behind  them. 

5.2.1  Extended  Union 

In  order  to  take  the  extended  union  of  two  relations  ri  and  r2  we  require  that 
they  have  equal  relation  schemes,  say  R.  The  scheme  of  the  resultant  structure 
is  also  equal  to  R.  We  define  extended  union  at  the  instance  level  as  follows. 

Definition  5.4:  Let  17  and  17  be  relations  on  scheme  R.  Let  X  range  over 
the  zero  order  attributes  in  Er  and  Y  range  over  the  higher  order  attributes 
in  Er.  The  extended  union  of  and  r2  is: 


*7  U*  r2  —  {t  |  (3ti  £  fi  A  3f2  £  r2  :  (VX,  Y  £  Er  :  f  [X]  —  ti[X]  —  t2[X] 

A  t[Y }  =  (hiY]  ue  t2[r]))) 

V  (i  6  n  A  (Vt*  £  r2  :  (VX  6  Er  :  t[X)  ±  t'[X]))) 

V  (t  £  r2  A  (Vt'  £  r,  :  (VX  e  Er  :  t[X]  ?  f'[X])))} 

Note,  this  definition  is  recursive  in  that  we  apply  the  extended  union 
to  each  higher  order  attribute  Y. 
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Figure  5-5.  Counterexample  to  preciseness  of  U®. 

Proposition  5.1:  Extended  union  is  faithful  to  standard  union. 

Proof:  The  definition  of  U®  differs  from  the  definition  of  U  only  when  higher 
order  attributes  are  present  in  the  scheme.  When  there  are  no  higher  order 
attributes,  as  in  Rel ,  then  the  definition  of  U®  reduces  to  a  selection  of  tuples 
that  are  in  both  relations  or  are  tuples  in  only  one  of  the  two  relations,  i.e.,  a 
standard  union.  □ 

Proposition  5.2:  Extended  union  is  not  a  precise  generalization  of  standard 
union  with  respect  to  unnesting. 

Proof:  Figure  5-5  shows  two  -ilNF  relations  ri  and  r 2  where  M*(ri  U*  r 2)  ^ 
M'(ri)U/i*(rj).  □ 

Extended  union  is  not  precise  due  to  the  syntactic  nature  of  standard 
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union.  Standard  union  does  not  take  into  account  dependencies  that  should 
exist  in  a  relation  if  it  is  going  to  be  nested.  If  we  agree  that  only  relations 
from  Rel*  which  are  in  PNF  should  be  allowed,  then  each  nesting  scheme  is 
allowed  if  and  only  if  certain  multivalued  dependencies  hold  in  the  completely 
unnested  relation. 

The  intuition  behind  the  extended  union,  and,  as  we  will  see,  the  other 
extended  operators,  is  to  take  advantage  of  the  MVDs  which  allow  us  to  nest 
relations  and  maintain  partitioned  normal  form.  For  instance,  in  the  example 
of  Figure  5-5  the  MVD  A  — >— *  B  |  C  holds.  Thus,  B  and  C  values  are  only 
indirectly  related  through  the  A  attribute,  and  the  primary  associations  are 
AB  and  AC.  Since  we  can  think  of  union  as  an  insertion  operation,  we  would 
like  to  be  able  to  insert  AB  and  AC  associations  independently  of  each  other. 

With  a  INF  relation  on  ABC  this  is  not  possible  unless  we  specify 
every  existing  AC  association  for  an  A-value  whenever  we  add  a  new  AB  asso¬ 
ciation  for  that  A  value,  and  vice  versa.  However,  in  the  ->1NF  relation  each  A 
value  functionally  determines  a  B*  and  C*  set,  and  each  set  can  be  indepen¬ 
dently  updated  with  our  extended  union.  A  similar  result  can  be  achieved  by 
decomposing  each  ABC  relation  into  AB  and  AC,  which  is  the  path  set  for  this 
scheme  (see  section  3.4).  We  then  perform  a  standard  union  among  the  corre¬ 
sponding  decomposed  relations,  and  finally  rejoin.  Proposition  3.1  ensures  that 
the  same  MVDs  will  hold  in  the  nev/  result  and  so  the  same  nesting  structure 


will  be  possible. 


If  we  use  a  modified  version  of  standard  union  which  takes  into  account 
the  MVDs  or,  equivalently,  the  join  dependency  which  produces  the  nested 
structure,  then  we  have  a  precise  extended  union  operator. 

Definition  5.5:  Let  ex  (Xx,X2, .  ..,Xn)  be  a  join  dependency  on  scheme  R 
with  zero  order  attributes  Er  =  Xi  U  X2  U  •  •  •  U  Xn-  The  decomposition  union 
(or  A -union)  of  two  INF  relations  ri  and  r2  on  R  is 

n  UA  r2  =cx  (r^Xi]  U  r2[X1],rx[X2]  U  r2[X2], . . .  .r^X*]  U  r2[Xn]) 
where  m  is  the  standard  natural  join. 

Proposition  5.3:  Extended  union  is  a  precise  generalization  of  A-union  with 
respect  to  unnesting,  where  the  join  dependency  used  in  the  A-unio.i  is  the 
path  set  of  the  -ilNF  relation’s  scheme  tree. 

Proof:  We  need  to  show  that  p*(r)  UA  p*(?)  =  p*(r  Ue  q)  for  any  r,q  £  Rel* 
for  which  r  U*  q  is  defined,  i.e.,  r  and  q  have  identical  relation  schemes.  We 
show  inclusion  both  ways  to  prove  the  equivalence. 

C  Let  f  be  a  tuple  in  H*{r)  UA  p*(?).  Two  cases  need  to  be  considered: 
either  t  came  only  from  tuples  in  one  of  p*(r)  or  p*(q),  or  f  is  a 
combination  of  tuples  from  p*(r)  and  /**(?),  put  together  via  the  join 
operation  in  the  A-union. 


Case  1:  Suppose  t  came  directly  from  tuples  in  p*{r).  The  argument 


*+  * 


»»*  /V 


for  q  is  symmetrical.  Due  to  the  join  dependency  holding  in  p *(r), 


all  of  these  tuples  agree  on  the  join  attributes  which  are  the  non-leaf 


nodes  in  the  scheme  tree  for  r.  Thus,  we  know  that  there  is  one 


tuple  in  n*(r)  which  decomposed  and  rejoined  to  make  t.  This  tuple 


unnested  from  a  single  tuple  tr  in  r.  Now  any  tuple  in  r  must  either 


be  intact  in  r  U*  q  if  there  was  no  tuple  in  q  with  the  same  partition 


key,  or  there  is  some  tuple  t'  in  r  Ue  q  in  which  each  nested  relation 


of  tr  is  a  subset  of  the  corresponding  nested  relation  in  t'.  In  either 


case,  unnesting  r  Ue  q  will  return  the  original  tuple  t. 


Case  S:  If  t  was  created  by  taking  pieces  of  tuples  from  both  /i*(r) 


and  n*(q),  as  in  Case  1,  the  tuples  from  which  it  came  must  agree  on 


the  non-leaf  nodes  in  the  scheme  tree  for  r  and  q.  Thus  the  tuples 


from  r  and  q  which  unnested  to  these  tuples  interact  in  the  extended 


union  of  r  and  q  which,  when  unnested,  must  contain  the  tuple  t. 


D  Let  T  be  a  set  of  tuples  in  p*(r  U*  q>)  such  that  all  tuples  in  T  unnested 


from  a  single  tuple  t  in  r  U*  q.  Two  cases  need  to  be  considered:  either 


t  comes  only  from  r  or  q,  or  t  is  a  combination  of  tuples  in  tr  in  r  and 


tq  in  q. 


Case  1:  Suppose  t  came  only  from  r.  The  argument  for  q  is  symmetri¬ 


cal.  All  tuples  in  T  will  get  decomposed  and  rejoined  by  the  A-union, 


plus  perhaps  participating  with  other  tuples  in  the  join.  But  at  least 


the  original  tuples  will  be  returned,  so  all  tuples  in  T  are  in  the  left 
hand  side. 
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Case  2:  Each  tuple  in  T  may  take  some  of  its  values  from  attributes 
in  tr  or  tg,  but  if  the  values  of  some  attributes  are  different,  then 
the  attributes  which  are  above  that  attribute  in  the  scheme  tree  have 
equal  values.  This  is  exactly  how  the  unnested  tuples  of  tr  and  tq  will 
interact  in  the  join  operation  of  the  A-union.  So  every  tuple  in  T  will 
be  the  join  of  pieces  from  an  unnested  tuple  tr  of  r  and  an  unnested 
tuple  tq  of  q.  □ 

We  note  that  Proposition  5.3  gives  us  a  method  for  expressing  extended  union  in 
terms  of  the  basic  algebra  operators.  The  operands  must  have  known  schemes 
and  so  we  can  use  the  path  set  of  the  associated  scheme  tree  to  perform  the 
projection  involved  in  the  decomposition  union.  The  sequence  of  operations 
would  be  to  completely  unnest  each  operand,  project  using  the  components 
determined  by  the  path  set,  union  each  of  the  corresponding  components  of 
each  operand,  join  the  new  components  (using  select  and  cartesian  product), 
and  nest  to  gain  the  original  structure.  If  the  operands  were  not  in  PNF,  then 
appropriate  algebra  operators  could  be  used  to  add  a  key  to  each  relation  or 
nested  relation  so  that  the  relations  are  in  PNF  (see  Chapter  6),  the  above 
procedure  applied,  and  then  the  keys  removed. 
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5.2.2  Extended  Intersection  J 

I 

Extended  intersection  has  the  same  scheme  requirements  as  extended  union. 

Two  tuples  intersect  if  they  agree  on  their  zero  order  attributes  and  they  have 
non-empty  extended  intersections  of  their  higher  order  attributes.  Since  we  do 
allow  empty  nested  relations  to  appear  in  the  ->1NF  model  without  null  values, 
a  tuple  with  an  empty  extended  intersection  of  some  higher  order  attributes 
must  be  eliminated.  This  is  also  critical  if  extended  intersection  is  to  be  a 
precise  generalization  of  standard  intersection. 

Definition  5.6:  Let  and  r2  be  relations  on  scheme  R.  Let  X  range  over 
the  zero  order  attributes  in  Er  and  Y  range  over  the  higher  order  attributes 
in  Er.  The  extended  intersection  of  ri  and  r2  is: 

T\  PI*  r2  =  {t  |  (3tj  Cq  A  3t2  E  r2  :  (VX,  Y  E  Er  :  t[X]  =  £i[.A]  =  f2[-X"] 

A  t\Y)  =  (t,[V)  n‘  t,[y))  A  t[Y]  #  0))} 

Proposition  5.4:  Extended  intersection  is  faithful  to  standard  intersection. 


Proof:  As  in  the  proof  for  union,  the  definition  of  ne  differs  from  the  definition 
of  n  only  when  higher  order  attributes  are  in  the  scheme.  When  only  relations 
in  Rel  are  being  considered,  the  definition  of  n"  reduces  to  the  definition  of 
standard  intersection.  a 


Proposition  5.5:  Extended  intersection  is  a  precise  generalization  of  standard 
intersection  with  respect  to  unnesting. 


Proof:  We  need  to  show  that  fi*(r)  fl  H*(q)  =  £t*(r  n*  g)  for  any  r,q  £  Rel*  for 
which  r  Ue  q  is  defined.  We  show  inclusion  both  ways  to  prove  the  equivalence. 

C  Let  t  be  a  tuple  in  fi*  (r)  ft (i*  (q) .  Then,  t  £  fi*(r )  and  t  £  (**{q)-  Now,  t 
unnested  from  some  tuple  tT  £  r  and  some  tuple  tq  £  q.  Furthermore, 
tr  and  tq  agree  on  the  attributes  which  are  the  non-leaf  nodes  in  the 
scheme  tree  for  r  and  q.  Therefore,  when  r  ne  q  is  calculated,  tr  and 
tq  will  participate  in  the  result,  and  when  unnested,  will  produce  the 
tuple  t.  Thus,  t  £  fi*(r  fle  q). 

D  Let  T  be  a  set  of  tuples  in  De  q)  such  that  all  tuples  in  T  unnested 
from  a  single  tuple  t  in  r  fl*  q.  Then,  all  tuples  in  T  agree  with  t  on 
the  attributes  which  are  the  non-leaf  nodes  in  the  scheme  tree  for  r 
and  q.  Furthermore,  the  only  values  of  attributes  which  are  leaf  nodes, 
which  are  in  tuples  of  T,  are  those  that  were  in  both  the  r  and  q  tuples 
which  participated  to  form  t.  Thus,  a  tuple  is  in  T  exactly  when  it 
agrees  with  some  tuple  unnested  from  r  and  some  tuple  unnested  from 
q.  That  is,  Vt'  £  T  :  t'  £  H*{r)  fl  H*[q).  □ 

We  note  that  a  A- intersection  operator  could  be  defined  in  a  similar 
manner  to  A-tmion,  although  it  is  not  necessary  as  ri  D  rj  =  ri  flA  r 2  for  any 
ri,r2  £  Rel.  Also,  the  comments  made  about  expressing  extended  union  in 
terms  of  the  basic  relational  algebra  operators  can  also  be  applied  to  extended 
intersection  except  that  the  decomposition  and  join  steps  are  not  required  in 


the  transformation. 


5.2.8  Extended,  Difference 

The  extended,  difference  operator  has  semantic  complications  similar  to  ex¬ 
tended  union.  Extended  difference  also  has  the  same  scheme  requirements  as 
union.  In  rx  — "  r 2  a  tuple  is  retained  from  rx  if  it  does  not  agree  with  any 
tuple  in  r2  on  the  zero  order  attribute?  or  if  it  does  then  it  has  non-empty 
extended  differences  between  the  higher  order  attributes.  Our  comments  on 
empty  nested  relations  from  section  5.2.2  apply  here  as  well. 

Definition  5.7:  Let  rx  and  r2  be  relations  on  scheme  R.  Let  X  range  over  the 

zero  order  attributes  in  Er  and  Y  and  Z  range  over  the  higher  order  attributes 

in  Er.  The  extended  difference  of  rx  and  r2  is: 

r%  — *  r2  =  {t  |  (3tx  6  q  A  3 t2  6  rj  A  3 Z  £  Er  :  (VX,Y  £  Er  : 

t[X]  =  t^X]  =  t2[X]  A  t[Y]  =  (<x[r]  -•  ta[K])  A  t[Y]  #  0)) 

V  (ten  a  (Vt1  £  r2  :  (VX  £  Er  :  t[X\  ^  t'[X])))} 
Proposition  5.6:  Extended  difference  is  faithful  to  standard  difference. 

Proof:  Similar  to  proofs  for  union  and  intersection.  □ 

Proposition  5.7:  Extended  difference  is  not  a  precise  generalization  of  stan¬ 
dard  difference  with  respect  to  unnesting. 

Proof:  Figure  5-6  shows  two  ->1NF  relations  rx  and  r2  where  ^*(rx  — e  r2)  ^ 
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Figure  5-6.  Counterexample  to  preciseness  of  — 

The  intuition  behind  this  definition  of  extended  difference  is  similar 
to  the  intuition  behind  extended  union.  We  think  of  difference  as  the  deletion 
of  information  from  the  database.  In  the  counterexample  in  Figure  5-6,  we 
are  trying  to  delete  two  relationships  from  rj,  the  AB  association  between  a 
and  b'  and  the  AC  association  between  a  and  c.  Since  there  is  no  association 
between  a  and  b'  in  rlf  nothing  changes  due  to  that  request.  However,  the  a 
to  c  association  is  in  r j  and  so  it  is  removed.  In  the  INF  versions  of  r%  and 
r2,  it  is  not  possible  to  express  only  an  AB  or  an  AC  relationship,  but  only  an 
artificial  ABC  relationship.  Thus  in  order  to  delete,  say,  an  AC  association,  we 
would  have  to  know  all  of  the  B  values  associated  with  the  A  value  so  all  ABC 
relationships  could  be  deleted. 

As  with  union,  the  problem  stems  from  the  MVDs  that  must  exist  in 
the  INF  counterparts  of  the  -<1NF  relations.  Our  solution  follows  the  same  line 
as  for  union.  We  first  decompose  the  relation  via  the  join  dependency  specified 
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by  the  scheme  tree,  perform  the  difference  on  the  decomposed  relations  and 
then  rejoin. 

Definition  5.8:  Let  tx  (Xi,  X2, . . . , Xn)  be  a  join  dependency  on  scheme  R 
with  zero  order  attributes  ER  =  XiUX2U-  •  *UXn.  The  decomposition  difference 
or  ^-difference ,  of  two  INF  relations  rj  and  r2  on  R  is 

i-!  -A  r2  =xj  (r^Xij  “  r2[Xi],  r1[X2j  -  r2[X2], .... rx[Xn\  -  r2[Xn]) 
where  txi  is  the  natural  join. 

Proposition  5.8:  Extended  difference  is  a  precise  generalization  of  A- 
difference  with  respect  to  unnesting,  where  the  join  dependency  used  in  the 
A-difference  is  the  path  set  of  the  ->1NF  relation’s  scheme  tree. 

Proof:  We  need  to  show  that  p*(r)  -A  p*(q)  =  M*(r  — *  g)  for  any  r,q  E  Rel* 
for  which  r  — *  g  is  defined,  i.e.,  r  and  q  have  identical  relation  schemes.  We 
show  inclusion  both  ways  to  prove  the  equivalence. 

C  Let  f  be  a  tuple  in  n*{r)  — A  M*(q)-  Two  cases  need  to  be  considered: 
either  t  came  only  from  tuples  in  p*(r),  or  f  is  a  combination  of  tuples 
from  A**(r)  and  H*(q)>  put  together  via  the  join  operation  in  the  A- 
difference. 

Case  1:  Suppose  t  came  directly  from  tuples  in  p*(r).  Due  to  the 
join  dependency  holding  in  p*(r),  all  of  these  tuples  agree  on  the 
join  attributes  which  are  the  non-leaf  nodes  in  the  scheme  tree  for  r. 
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Thus,  we  know  that  there  is  one  tuple  in  /i*(r)  which  decomposed  and 
rejoined  to  make  t.  This  tuple  unnested  from  a  single  tuple  tr  in  r. 
Since  t  came  directly  from  it  was  not  affected  by  tuples  in  q. 

So  tr  €  r  — *  q,  and  unnesting  r  — *  q  will  return  the  original  tuple  t. 

Case  2:  If  t  was  created  by  taking  pieces  of  tuples  from  jx*(r)  that 
were  not  in  n*[q),  as  in  Case  1,  the  tuples  from  which  it  came  must 
agree  on  the  non-leaf  nodes  in  the  scheme  tree  for  r  and  q.  Thus  the 
tuples  from  r  and  q  which  unnested  to  these  tuples  interact  in  the 
extended  difference  of  r  and  q  which,  when  unnested,  must  contain 
the  tuple  f. 

D  Let  T  be  a  set  of  tuples  in  /i*(r  — *  q)  such  that  all  tuples  in  T  unnested 
from  a  single  tuple  t  in  r  -*  q.  Two  cases  need  to  be  considered:  either 
t  comes  only  from  r,  or  t  is  a  combination  of  tuples  in  tr  in  r  and  t9 
in  q. 

Case  1:  Suppose  t  came  only  from  r.  All  tuples  in  T  will  get  decom¬ 
posed  and  rejoined  by  the  A-difference.  Thus,  the  ordinal  tuples  will 
be  returned,  so  all  tuples  in  T  are  in  the  left  hand  side. 

Case  2:  Each  tuple  in  T  may  take  some  of  its  values  from  attributes 
in  tr  that  are  not  in  tq,  but  only  if  the  attributes  which  are  above  that 
attribute  in  the  scheme  tree  have  equal  values.  This  is  exactly  how 
the  unnested  tuples  of  tr  and  tq  will  interact  in  the  join  operation  of 
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the  A-difference.  So  every  tuple  in  T  will  be  the  join  of  pieces  from 
an  unnested  tuple  tr  of  r  that  are  not  in  an  unnested  tuple  tq  of  q.  □ 

We  note  that  a  procedure  similar  to  that  used  to  express  extended  union  in 
terms  of  the  basic  relational  algebra  operators  can  be  applied  to  extended 
difference. 

5.2.4  Cartesian  Product  and  Select 

The  standard  product  and  select  operators  can  be  used  on  -ilNF  relations. 
Since  nest  is  an  inverse  for  unnest  when  dealing  with  PNF  relations,  when 
products  or  selections  on  tuples  within  nested  relations  are  desired,  the  appro¬ 
priate  attributes  can  be  unnested,  the  operation  performed,  and  the  relation 
renested  according  to  the  user’s  desires. 

More  sophisticated  predicates  for  select  could  be  defined  using  set 
comparison  operators  (see  [AB2,  Schl]),  however  these  operators  do  not  have 
a  simple  mapping  to  standard  select.  In  fact  set  comparisons  in  the  standard 
algebra  usually  require  a  combination  of  product,  select,  and  project  operators. 
There  is  a  proposal  for  a  recursive  algebra  [Jae3]  in  which  the  standard  oper¬ 
ators  are  applied  to  nested  relations  in  recursively  constructed  queries.  These 
extensions  appear  to  be  precise  generalizations,  however  a  recursive  algebra  is 
beyond  the  scope  of  this  dissertation. 


5.2.5  Extended  Natural  Join 

Join  operations  are  difficult  to  define  in  the  ->1NF  model  due  to  the  possibility 
of  different  nesting  depths  for  the  attributes.  The  problems  with  an  extended 
natural  join  (ex')  can  be  illustrated  as  follows. 

Let  ri  be  a  relation  on  Ri  =  ( A,X),X  =  (£,C)  and  let  r2  be  a 
relation  on  f?2  =  ( B,D ).  Then  tx*  r2  is  the  cartesian  product  of  r±  and  r2 
since  Er{  fl  Er3  —  0.  However,  in  the  INF  counterparts  of  r\  and  r2,  attribute 
B  is  a  common  attribute  so  a  join  on  B  must  take  place.  Thus,  we  limit  the 
relations  which  can  participate  in  an  extended  natural  join  to  those  whose  only 
common  attributes  are  elements  of  the  top  level  scheme,  i.e.,  in  Er  for  scheme 
R,  or  are  attributes  of  a  common  higher  order  attribute.  With  a  recursive 
algebra  as  discussed  above,  more  general  join  operations  could  be  defined. 

Let  rj  be  a  relation  on  scheme  R\  and  r2  a  relation  on  scheme  J?2.  We 
define  the  extended  natural  join  ri  cx*  r2  as  a  recursive  application  of  a  rule 
similar  to  the  definition  of  natural  join  used  for  standard  INF  relations. 

In  the  standard  natural  join,  two  tuples  contribute  to  the  join  if  they 
agree  on  the  attributes  in  common  to  both  schemes.  Under  extended  natu¬ 
ral  join,  two  tuple  contribute  to  the  join  if  the  extended  intersection  of  their 
projections  over  common  attributes  is  not  empty. 

Definition  5.9:  Let  X  be  the  higher  order  attributes  in  Erx  n  Er2,  A  = 
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Er1  -  X,  and  B  —  Er7  -  X.  Then  the  extended  natural  join  of  ri  and  r2  is 

tx*  r2  which  produces  a  relation  r  on  scheme  R  where: 

1.  R  =  (j4,  X,  B),  and 

2.  r  =  |  (3u  6  rltv  G  r2  :  t[A\  =  u[j4]  A  f[I?]  =  v[B]  A  t[X]  =  (u[X]  H* 

»[*])  A  t[X]  *  0} 

Proposition  5.9:  Extended  natural  join  is  faithful  to  standard  natural  join. 

Proof:  If  there  are  no  higher  order  attributes,  then  X  is  empty,  and  the 
definition  of  extended  natural  join  reduces  to  the  definition  of  standard  natural 
join.  □ 

Proposition  5.10:  Extended  natural  join  is  precise  generalization  of  standard 
natural  join  with  respect  to  unnesting. 

Proof:  We  need  to  show  that  p*[r)  tx  p'(q)  =  p*{r  tx*  q)  for  any  rtq  6  Rel*  for 
which  r  tx*  q  is  defined.  We  show  inclusion  both  ways  to  prove  the  equivalence. 

C  Let  t  be  a  tuple  in  M*(r)  ^  A **(?)•  Then,  t  agrees  on  all  zero  order 
attributes  common  to  r  and  q  and  all  attributes  which  unnested  from 
common  higher  order  attributes  in  r  and  q.  Let  tr  6  r  and  tq  G  q , 
be  the  tuples  that  unnested  to  participate  in  producing  t.  In  r  tx*  q, 
we  will  take  the  extended  intersection  of  the  common  higher  order 
attributes  of  r  and  q ,  producing  only  those  values  common  to  both. 
Since  tr  and  tq  agree  on  all  attributes  which  unnest  from  the  common 
higher  order  attributes,  they  will  participate  in  the  extended  intersec- 
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tion,  and  when  we  unnest  this  result,  the  tuple  t  will  appear. 

D  Let  T  be  a  set  of  tuples  in  p*  (r  m*  9)  such  that  all  tuples  in  T  unnested 
from  a  single  tuple  t  in  r  ex*  q.  Then,  there  axe  tuple  tr  G  r  and  tq  6  q 
that  participated  to  make  t.  These  tuples  agree  on  the  common  zero 
order  attributes  of  r  and  q.  Furthermore,  t  contains  only  values  in 
the  common  higher  order  attributes  that  are  in  both  tT  and  tq.  Thus, 
when  we  unnest  tr  and  tq  and  join  the  result  we  match  up  only  on  those 
same  common  values.  Thus,  all  tuples  of  T  are  also  in  /x*(r)  cx  p*{q). 


5.2.6  Extended  Projection 


Extended  projection  is  a  normal  projection  followed  by  a  tuplewise  extended 
union  of  the  result.  The  union  merges  tuples  which  agree  on  the  zero  order 
attributes  left  in  the  projected  relation. 

Definition  5.10:  The  extended  projection  of  relation  r  on  attributes  X  is 

*iw=  ir  <<) 

te*x(r) 

Note,  that  projection  still  removes  duplicate  tuples,  that  is  those  which  agree 
on  all  attributes,  with  set  equality  holding  on  higher  order  attributes. 


Proposition  5.12:  Extended  projection  is  precise  generalization  of  standard 
projection  with  respect  to  unnesting. 

Proof:  We  need  to  show  that  *x'(M*(r))  =  Ai*(7rx(r))>  where  X'  are  all  of  the 
attributes  of  the  completely  unnested  scheme  X.  We  show  inclusion  both  ways 
to  prove  the  equivalence. 

C  Let  t  be  a  tuple  in  Wx'(p*(r)).  Then,  t  is  the  projection  onto  X'  of 
some  tuple  which  unnested  from  a  tuple  tT  in  r.  For  *x(r),  tr  will 
be  projected  onto  X  and  possibly  combined  with  other  tuples  in  an 
extended  union.  In  any  case,  when  unnested,  the  tuple  t  will  be  in 
the  result. 

D  Let  T  be  a  set  of  tuples  in  M*(*x(r))  such  that  all  tuples  in  T  unnested 
from  a  single  tuple  t  in  jr£-(r)’  Then,  there  are  two  cases:  either  t  came 
directly  from  a  projection  of  r,  or  t  is  a  combination  of  tuples  in  the 
projection  of  r. 

Case  1:  Suppose  t  came  directly  a  tuple  in  a  projection  of  r.  Then,  the 
projection  hasn’t  been  altered  by  the  extended  union,  and  since  unnest 
commutes  with  projection  [FT],  all  tuples  in  T  will  be  in  nx> (/x* (»*))• 

Case  2:  Suppose  t  is  the  extended  union  of  two  or  more  tuples  in  the 
projection  of  r.  Then  all  of  these  tuples  will  be  combined  only  where 
they  agree  on  non-leaf  attributes  of  the  scheme  tree  for  *x{r)-  Now, 


the  unnest  of  r  will  not  eliminate  any  of  these  tuples,  so  the  projection 
onto  X *  will  return  all  tuples  in  T.  □ 

5.3  Closure  of  PNF  Relations  Under  the  Extended  Oper¬ 
ators 

Theorem  5-3.  The  class  of  PNF  relations  is  closed  under  extended  union, 
extended  intersection,  extended  difference,  cartesian  product,  extended  natural 
join,  extended  projection,  and  selection. 

Proof:  The  proofs  for  each  operator  are  presented  below. 

•Extended  Union —  We  show  that  for  any  relation  structures  R  = 
(i2,rx)  and  S  =  ( R,r2 )  with  attribute  set  Er  that  T  =  R  U*  S  is  a  PNF 
relation. 


By  definition  of  U%  T  has  scheme  R  with  attribute  set  Er.  Let  the 
instance  of  T  be  r3.  We  need  to  show  that,  in  r3,  A  — *•  Er,  where  A  is  the  set 
of  zero  order  attributes  of  Er.  Suppose  it  does  not.  Then  two  tuples  tx  and  t2 
in  r3  must  agree  on  A  and  yet  disagree  on  Er.  Now  fx  (and  likewise  t2)  either 
was  carried  over  in  total  from  rj  or  r2  since  it  disagreed  on  A  with  all  tuples 
in  the  other  relation,  or  was  created  from  tuples,  one  each  in  rx  and  r2  which 
agreed  on  A  and  had  the  values  of  their  higher  order  attributes  combined  with 
a  recursive  application  of  extended  union.  Thus  there  are  four  cases: 
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Case  1.  tx  and  t2  both  carried  over  in  total: 

tx  and  t2  cannot  both  come  from  one  relation  as  each  is  in  PNF  and 
if  ti  agrees  with  t2  on  A  then  they  agree  on  Er.  They  cannot  come 
from  different  relations  as  they  agree  on  A  and  yet  each  is  required 
by  the  definition  of  extended  union  to  disagree  with  all  other  tuples 
in  the  other  relation  on  A.  Thus  we  have  a  contradiction  for  case  1. 

Case  2.  tx  carried  over  in  total  and  t2  created  from  a  tuple  in  each  of  ri  and 
r2: 

Suppose  1 1  came  from  rx.  Then  fx  disagrees  with  all  tuples  in  r2  on 
A.  But  t2  was  created  from  tuples  that  agreed  on  A ,  one  in  each  of 
rx  and  r2.  The  argument  for  fx  coming  from  r2  is  symmetric,  and  so 
case  2  leads  to  a  contradiction. 

Case  3.  Symmetric  to  case  2  with  tx  and  t2  interchanged. 

Case  4.  tx  and  t2  both  created  from  a  tuple  in  each  of  rx  and  r2: 

Since  ix  and  t2  agree  on  A  then  all  tuples  in  rx  and  r2  from  which 
they  were  created  agree  on  A.  Thus  all  tuples  from  rx  must  be  the 
same  tuple  as  A  — *  Er  holds  in  rx.  The  symmetric  argument  holds 
for  r2.  Thus  tx  and  t2  were  both  created  from  the  same  two  tuples,  by 
an  identical  operation,  and,  therefore,  agree  on  ER.  Thus  we  have  a 


contradiction  for  case  4. 


Since  cases  1-4  all  produced  a  contradiction  the  hypothesis  is  false  and  indeed 
A  — ♦  Er  in  rs  and  so  T  is  in  PNF. 

•Extended  Intersection —  This  proof  is  the  same  as  for  extended 
union  except  that  there  is  only  one  case  in  the  case  analysis  that  applies,  case 
4. 

•Extended  Difference —  This  proof  is  the  same  as  for  extended 
union  except  we  need  only  consider  tuples  carried  over  in  total  from  just  rt. 

•Cartesian  Product —  Let  T  =  (Vyv)  =  R  x  S,  where  R  =  ( R,r ) 
and  S  =  ( S,s ).  We  assume  that  the  attributes  have  been  renamed  so  that 
Er  fl  Es  —  0.  Then  Ev  =  Er  U  Es-  We  show  that  AB  — ►  ErEs  holds  in  v 
where  A  is  the  set  of  zero  order  attributes  in  Er  and  B  is  the  set  of  zero  order 
attributes  in  Es-  Suppose  it  does  not.  Then  two  tuples  tY  and  in  v  must 
agree  or.  AB  and  yet  disagree  on  ErEs-  Assume  the  disagreement  is  in  Er  as 
a  symmetric  argument  can  be  made  for  Es  - 

We  have  A  — >  Er  in  r  since  R  is  in  PNF.  We  also  have  that  each  tuple 
in  v  agrees  with  some  tuple  in  r  on  Er.  Thus  there  are  tuples  in  r  that  agree 
with  t\  and  f2  on  Er.  Since  t\  and  t3  agree  on  AB  they  agree  on  A,  but,  as 
assumed,  disagree  on  ErEs  and  so  disagree  on  Er.  Thus,  A  — »  Er  does  not 
hold  in  r  which  is  a  contradiction.  Therefore,  the  hypothesis  is  false  and  V  is 


•Extended  Natural  Join —  Let  V  =  (V'.v)  =  Z  m*  S,  where  Z  = 
{ R,r )  and  S  =  { S,s ).  We  have  that  Ev  =  ErEs.  Let  X  =  ErCiEs,  A  =  Er  —  X 
and  B  —  Es  —  X.  Let  AzAh  =  A,  where  Az  are  the  zero  order  attributes  of 
A  and  Ah  are  the  higher  order  attributes  of  A.  Similarly,  let  BzBh  =  B  and 
XzXh  =  X. 

We  show  that  AZBZXZ  — ►  ErEs  holds  in  v.  Suppose  it  does  not. 
Then  two  tuples  t\  and  in  v  must  agree  on  AZBZXZ  and  yet  disagree  on 
ErEs •  This  disagreement  is  either  on  Ah,  Bh,  or  Xh ■  If  the  disagreement  is  on 
Ah  or  Bh  then  the  arguments  of  cartesian  product  apply  and  a  contradiction 
is  reached.  If  the  disagreement  is  on  Xh  then  the  argument  of  case  4  of  union 
applies  since  the  tuples  from  which  ty  and  t2  came  must  be  identical  in  r  and 
s  as  FDs  AZXZ  — *  Er  holds  in  r  and  BZXZ  — *  Es  holds  in  s.  Thus  we  reach  a 
contradiction  and  so  T  is  in  PNF. 

•Extended  Projection —  When  an  extended  projection  operation  is 
applied  to  a  relation  we  do  not  change  any  FDs  that  hold  in  the  nested  relations 
of  each  tuple,  as  we  either  take  the  nested  relation  in  total  or  eliminate  it.  Also 
if  all  nested  relations  meet  the  requirements  to  be  in  PNF  then  a  single  tuple 
containing  these  nested  relations  is  automatically  in  PNF.  Therefore,  we  can 
apply  the  proof  for  union  since  extended  projection  is  a  tuplewise  extended 
union  of  the  tuples  resulting  from  a  normal  projection  operation,  each  of  which 
we  determined  was  a  PNF  relation. 


•Selection —  A  subset  of  the  tuples  of  a  relation  cannot  violate  an 
FD  that  holds  on  the  entire  relation,  so  any  selection  of  tuples  from  a  PNF 
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relation  produces  a  PNF  relation. 


□ 


N 


Chapter  6 

Equivalence  of  the  Relational  Calculus 
and  the  Relational  Algebra 


In  this  chapter,  we  prove  that  the  relational  calculus  and  algebra  as  extended  to 
handle  nested  relations  are  equivalent.  We  first  show  that  all  relational  algebra 
expressions  can  be  expressed  in  the  safe  relational  calculus,  and  then  the  inverse 
relationship. 

6.1  Reduction  of  Relational  Algebra  to  Relational  Calcu¬ 
lus 

Theorem  &-1.  If  E  is  a  relational  algebra  expression,  then  there  is  a  safe 
expression  in  the  relational  calculus  equivalent  to  E. 

Proof:  The  proof  is  by  induction  on  the  number  of  occurrences  of  operators 
in  E.  The  basis  and  the  five  cases  (Cases  1-5)  for  U,  — ,  x,  x,  and  o  are  as  in 
[Ull].  We  need  two  more  cases  for  the  operators  u  and  p. 

Case  6:  E  =  VB=[AiA2...Ak){Ei)-  Let  E\  be  equivalent  to  safe  expression  {*(")  | 
tpi  (*)}  and  let  attribute  Ai  correspond  to  the  jj’th  attribute,  for  1  <  *  <  k, 
and  let  all  attributes  not  among  the  A%  correspond  to  the  jt  th  attribute,  for 
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k  <  t<n.  Then  E  is  equivalent  to 


0(n  k+1)|(3u)(V'i(u)  A  A*H=ttliel 

m,i 

At[ii]={u/W|(3t>)(V>i(v)  A  A  t[m]=v[jt]  A  A  M*1=w(ii])})} 

m,l  t=l 

where  m  ranges  over  [1 :  ji  —  1,  j\  +  1 :  n  —  k  +  l]  a si  ranges  over  [k  +  1 :  n]. 

Since  sets-of- values  are  being  created,  we  need  to  check  if  the  elements  are  from 
a  finite  domain,  and,  in  this  case,  they  are  from  DOM[rJ> i),  so  this  expression 
is  safe. 

Case  7:  E  =  Ha{E\).  Let  E\  be  equivalent  to  safe  expression  {t^  |  V’i(O) 
and  let  attribute  A  correspond  to  the  ith  attribute  and  let  the  arity  of  A  be  k. 
Then  E  is  equivalent  to 

{<("+*-1) | (3u) (^i (u)  A  A<M=«[*1  A  (3w)(u;  e  u[t]  A  A^W^M))) 

m,l  M 

where  m  ranges  over  [l  :  s’ — 1,  »+&  :  n+k  —  1]  as  £  ranges  over  [l  :  *  —  1,  *4-1  :  n], 
and  where  p  ranges  over  [*  :  *  +  k  —  l]  as  q  ranges  over  [1 :  k]. 

As  in  case  6,  the  elements  of  DOM[rj) i)  are  the  only  ones  used  in  this  expression, 
so  it  is  safe  as  well.  □ 

6.2  Reduction  of  Relational  Calculus  to  Relational  Alge¬ 
bra 


Theorem  6-2.  If  E  is  a  safe  expression  in  the  relational  calculus  then  there 


is  a  relational  algebra  expression  equivalent  to  E. 

In  order  to  prove  the  theorem  we  must  first  establish  some  basic  re¬ 
sults. 

Lemma  6-1.  If  ip  is  any  formula  in  tuple  calculus  then  there  is  an  equivalent 
formula  ip1  of  tuple  calculus  with  no  occurrences  of  A  or  V.  If  ip  is  safe,  so  is 
,P'. 

Proof:  See  [Ull],  Lemma  5-2.  □ 

Lemma  6-2.  If  ip  is  any  formula  in  tuple  calculus  then  there  is  an  algebra 
expression  for  DOM(ip). 

Proof:  Completely  unnest  each  relation  and  constant  that  contains  nested 
relations,  and  appears  in  ip.  Then,  as  in  [Ull],  use  projection  and  union  to  form 
a  unary  relation,  containing  all  possible  values  that  are  mentioned  in  ip.  □ 

Our  proof  of  the  theorem  mirrors  the  proof  in  [Ull]  of  the  equivalence 
of  the  (INF)  relational  calculus  and  algebra.  In  [Ull],  an  algebra  expression 
was  created  which  produced  a  unary  relation  E  of  all  values  either  mentioned 
explicitly  as  constants  in  the  calculus  expression  or  exists  in  any  relation  men¬ 
tioned  in  the  calculus  expression.  Each  atom  of  the  calculus  expression  is  then 
translated  as  a  function  of  x"=1i2  where  n  is  the  number  of  attributes  in  all 
tuples  variables  being  used  in  the  subexpression  where  the  atom  occurs.  The 
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relation  E  is  basically  a  domain  of  values  from  which  the  calculus  expression 
must  create  the  tuples  in  the  result.  However,  when  we  move  to  ->1NF  rela¬ 
tions,  it  is  not  possible  to  create  a  domain  of  values  using  this  technique.  Each 
tuple  variable  may  range  over  values  that  are  nested  relations,  and  so  to  include 
all  possible  nested  relations,  we  would  have  to  have  a  technique  for  creating 
a  powerset  using  the  relational  algebra.  Since  it  is  not  possible  to  create  a 
powerset  using  the  algebra  (see  Appendix  A),  we  will  use  subsets  of  all  possible 
tuples  for  each  tuple  variable  and  each  component  of  a  tuple  variable  that  is 
defined  as  a  nested  relation.  These  limited  domains,  when  completely  unnested, 
contain  all  possible  tuples  from  which  the  calculus  expression  will  select  tuples 
for  a  completely  unnested  result. 

Definition  6.1:  A  limited  domain  for  a  tuple  variable  t,  denoted  Dt,  appearing 
in  a  safe  calculus  expression,  {x  |  0(z)},  is  an  extended  relational  algebra 
expression  which  produces  a  -'INF  relation  r  which,  when  completely  unnested, 
contains  all  tuples,  made  up  of  values  from  which  need  to  be  tested 

for  inclusion  or  exclusion,  by  the  atoms  of  the  calculus  expression  referring  to  t. 
The  -ilNF  tuples  which  actually  are  tested  by  t  will  be  an  extended  intersection 
of  Dt. 

If  there  is  a  subformula  of  the  form  (3t)(p(t)),  then  a  limited  domain 
for  t  contains  tuples  to  be  included  in  the  result  if  they  satisfy  p.  If  there  is  a 
subformula  of  the  form  ->(3 t)(p(t)),  or  (Vt)(->p(f)),  then  a  limited  domain  for 
t  contains  tuples  to  be  excluded  from  the  result  if  they  satisfy  p.  In  the  main 


102 


body  of  the  proof  we  present  a  way  to  construct  an  algebra  expression  which 
performs  the  proper  inclusions  and  exclusions  on  the  tuples  in  each  limited 
domain.  We  use  the  extended  operators  defined  in  Chapter  5  to  include  and 
exclude  tuples  from  nested  relations. 

Lemma  6-3.  Given  a  safe  tuple  calculus  expression  {t  |  0(i)},  there  is  an 
algebra  expression  Dti  for  a  limited  domain  of  each  tuple  variable  t,  mentioned 
in  rl>,  or  any  nested  expression  of  0. 

Proof:  Since  the  calculus  expression  is  safe,  we  claim  that  we  can  determine 
each  Dt.  by  scanning  the  expression  for  named  relations  and  constants.  Each 
atom  in  the  expression  constrains  the  values  that  a  tuple  variable  or  a  compo¬ 
nent  of  a  tuple  variable  may  assume. 

The  following  algorithm  examines  each  atom  in  the  expression  and 
adds  algebra  expressions  to  each  domain  so  that  the  possible  values  which  that 
atom  references  will  be  included  in  the  domain.  The  intuition  behind  this 
algorithm  is  as  follows.  When  atoms  refer  to  named  relations  and  constants 
the  reference  is  direct  and  known.  However,  when  the  atoms  refer  only  to  tuple 
variables,  then  the  reference  is  indirect,  and  must  be  solved  in  terms  of  tuple 
variables  which  have  direct  and  known  references.  In  addition,  there  may  be 
more  than  one  atom  which  references  a  particular  attribute  of  a  tuple  variable, 
and  so  we  may  get  multiple  expressions  for  each  domain.  Thus,  as  the  algorithm 
creates  the  algebra  expression  for  each  domain,  it  also  creates  a  graph  which 


tells  us  how  to  solve  the  indirect  references  in  our  algebra  expressions.  Let 
D\  be  the  algebra  expression  for  the  limited  domain  of  the  tth  attribute  of 
tuple  variable  t.  The  graph  will  be  constructed  of  nodes,  directed  edges,  and 
directed  and-edges.  A  directed  and-edge  is  a  single  edge  which  goes  from  a 
single  node  to  a  set  of  one  or  more  nodes.  Nodes  will  be  labeled  with  the 
limited  domain  variable,  and  edges  will  be  labeled  with  algebra  expressions 
which  may  become  part  of  the  limited  domain  of  the  node  from  which  the 
edges  emanate,  and  a  special  label  if  the  atom  for  which  the  label  was  created 
involved  a  >  comparison.  Atoms  involving  >  comparisons  usually  do  not  add 
anything  to  the  limited  domains  that  would  not  be  included  by  another  type 
of  atom.  However,  there  is  the  special  case  where  two  atoms  define  a  range  of 
values,  which  is  the  only  specification  of  the  limited  domain  of  some  component 
of  a  tuple  variable;  e.g.,  x[l]  >  2  A  — *(x[lj  >5).  In  this  case,  we  use  the  algebra 
expression  for  DOM(rl> )  (Lemma  6-2),  so  that  we  get  every  value  in  the  range. 
Note  that  if  there  are  values  in  the  range  that  are  not  in  DOM[\}j)  then  the 
expression  is  not  safe. 

Algorithm  1 

Create  a  graph  with  one  node  labeled  RC ,  standing  for  named  relations 
and  constants; 

For  each  tuple  variable  t 
do 

let  k  denote  the  arity  of  t\ 

create  k  nodes  in  the  graph,  labeled  D\,  1  <  t  <  k 
end  do 

For  each  atom  in  the  calculus  expression 
do 


case  atom 
ter: 

let  k  denote  the  arity  of  t; 
add  directed  edges  from  D\  to  RC,  1  <  t  <  it; 
for  1  <  i  <  k,  label  the  edge  from  D\  to  RC  with  7r,(r); 
t  e  u[j]  : 

let  k  denote  the  arity  of  t; 
add  directed  edges  from  D\  to  D}u,  1  <  i  <  it; 
for  1  <  *  <  k,  label  the  edge  from  D\  to  D}u  with  *■<(/*!  (£)£)); 
t\j\0a  or  aOt[j],  0  e  {=,>}  : 
add  a  directed  edge  from  D\  to  RC; 
label  the  edge  C ,  where  C  is  a  new  unary  relation 
containing  the  single  tuple  <  a  >; 
add  a  special  label,  $,  to  the  edge  if  0  =  >; 
t(j]*u[*],  *€{=,>}  : 

add  directed  edges  from  D{  to  Dlu  and  from  to  D\  ; 
label  the  edge  from  D\  to  with  D^; 
label  the  edge  from  to  D\  with  D\ ; 
add  a  special  label,  $,  to  each  edge  if  0  =  >; 
t\j]  =  (u^)  |  V>'(u)>  : 

add  a  directed  and-edge  from  D\  to  the  set  of  nodes  D'u, 

1  <i<l; 

label  the  and-edge  i/1=(li2 . <)(-Di  x  £)*  x  •  •  •  x  D*); 

end  case 
end  do 

Mark  node  RC; 

Let  V  be  the  algebra  expression  for  DOM(0); 

While  some  node  in  the  graph  is  not  marked 
do 

Choose  an  unmarked  node  N  with  at  least  one  edge,  without  a 
special  label  $,  directed  towards  a  marked  node,  or 
at  least  one  and-edge  directed  towards  a  set  of  nodes,  all  of 
which  are  marked,  or 

if  neither  of  the  above  cases  applies,  at  least  one  edge,  with  a 
special  label  $,  directed  towards  a  marked  node; 

If  the  special  case  using  label  $  was  invoked  then  let  C  =  V 
else  let  C  =  0; 

Set  the  algebra  expression  for  the  domain  labeled  N  to 
L\  U  L2  U  •  •  •  U  Lp  U  C,  where  p  is  the  number  of  edges 


and  and-edges  directed  from  iV  to  marked  nodes, 
and  Li  is  the  label  of  the  *th  such  edge; 

Mark  node  N 
end  do 

For  each  tuple  variable  t  with  arity  k,  set  Dt  to  D]  x  Df  x  • 


•  x  D*. 


The  correctness  of  this  algorithm  follows  from  the  following  argu¬ 


ments.  First,  we  show  that  the  algorithm  halts.  Suppose  that  it  does  not  halt. 


Then  there  must  be  unmarked  nodes  in  the  graph  and  no  path  from  them  to  the 


node  RC.  Consider  the  tuple  variables  naming  these  nodes.  The  variables  are 


used  only  in  atoms  which  never  refer  to  any  of  the  relations  or  constants  in  the 


expression.  So  they  can  take  unknown  values  and  still  satisfy  the  expression. 


As  there  is  no  way  to  determine  these  values,  the  expression  must  be  unsafe. 


This  is  a  contradiction,  and  so  the  algorithm  must  halt. 


The  expressions  are  correct  if  each  limited  domain  includes  all  possible 


tuples  which  the  calculus  expression  will  include  or  exclude  from  the  result. 


Suppose  some  limited  domain  Dt  does  not  include  all  such  tuples.  Then,  there 


must  be  an  atom  in  which  t  appears  that  must  test  values  not  appearing  in 


Dt.  The  atom  cannot  compare  t  or  a  component  of  t  to  a  named  relation  or 


constant  using  =  or  E,  since  these  tuples  always  included  due  to  the  initial 


marking  of  node  RC.  If  the  comparison  involves  >,  then  the  there  must  be 


other  comparisons  involving  t  in  order  for  the  expression  to  be  safe.  Thus,  the 


atom  compares  t,  or  a  component  of  t,  with  either  the  component  of  another 


tuple  variable,  x,  or  a  set  of  tuples,  u,  created  by  a  nested  calculus  expression. 


Let  us  assume  that  the  entire  tuple  variable  is  being  accessed,  otherwise  add 
the  appropriate  superscript  to  the  limited  domain  variable  if  only  a  component 
is  being  accessed. 

In  the  first  case,  either  Dt,  Dx,  or  both  Dt  and  Dz,  are  determined 
by  other  comparisons  in  other  atoms.  Consider  each  of  these  subcases,  (l)  If 
Dt  is  determined  by  comparisons  within  other  atoms  and  Dz  is  not,  then  the 
comparison  involving  t  and  z  does  not  add  any  tuples,  and  Dx  is  a  subset  of  Dt. 
(2)  If  Dx  is  determined  by  comparisons  within  other  atoms  and  Dt  is  not,  then 
Dt  is  a  subset  of  Dx  and  we  must  make  a  new  argument  for  Dx.  If  we  continue 
to  invoke  this  subcase,  a  trivial  induction  shows  that  we  eventually  run  out  of 
tuple  variables  and  if  the  last  variable  used  is  not  expressed  in  terms  of  named 
relations  or  constants  then  the  expression  is  not  safe.  (3)  If  both  Dx  and  Dt 
are  determined  by  other  comparisons  then  the  algorithm  either  adds  Dx  to  Dt 
or  Dt  to  Dx,  and  so  subcase  1  and  subcase  2  apply,  respectively. 

In  the  case  of  comparison  with  a  set  of  tuples  u,  it  must  be  that  the 
limited  domain  Du  does  not  contain  all  possible  tuples,  and  so  we  make  a  new 
argument  for  Du .  This  case  can  only  be  invoked  as  long  as  there  are  still  nested 
calculus  expressions.  Once  we  have  exhausted  them,  the  first  case  applies. 

Thus,  either  the  expression  is  unsafe,  or  we  have  included  all  the 
necessary  tuples  in  our  limited  domains,  and  so  the  algorithm  is  correct.  □ 
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Proof  of  Theorem  6-2:  Let  {t  j  ip(t)}  be  a  safe  tuple  calculus  expression.  We 
construct  an  equivalent  algebra  expression.  By  Lemma  6-3  we  have  an  algebra 
expression  Dx  for  each  tuple  variable  x  mentioned  in  ip.  By  Lemma  6-1  we  may 
assume  that  ip  has  only  the  operators  V,  and  3. 

We  prove  by  induction  on  the  number  of  operators  in  a  subformula  w 
of  ip  that  if  w  has  free  variable  s,  then 

Dt  ne  {«  |  w(s)} 

has  an  equivalent  expression  in  relational  algebra.  Then,  as  a  special  case,  when 
u  is  ip  itself,  we  have  an  algebraic  expression  for 

D,  n‘  {i  |  v>(t)> 

Since  ip  is  safe,  intersection  with  Dt  does  not  change  the  relation  denoted,  so 
we  shall  have  proved  the  theorem.  We  use  the  extended  intersection  operator 
since  Dt  may  contain  nested  relations  which  need  to  be  intersected  with  the 
corresponding  nested  relations  produced  by  ip. 

In  order  to  avoid  problems  where  uA(fiA(r))  r,  and  so  that  the  ex¬ 
tended  operators  do  not  interact  improperly,  we  assume  each  database  relation 
(r, $,...),  their  nested  relations,  and  relations  created  by  collecting  constants 
into  a  limited  domain,  have  an  implicit  keying  attribute  (or  set  of  attributes) 
whose  value  uniquely  determines  the  values  of  all  other  attributes.  We  con¬ 
sider  this  attribute  to  be  added  to  each  relation  before  it  is  used  and  removed 
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when  the  relation  is  projected  or  presented  as  the  final  result,  using  appropri¬ 
ate  algebra  operations.  A  key  can  always  be  added  to  a  relation  by  making 
a  side-by-side  copy  of  the  relation  with  itself  and  using  one  of  the  copies  as  a 
key.  If  a  nested  relation  needs  a  key  then  after  ensuring  the  relations  in  which 
the  nested  relation  resides  are  keyed,  we  unnest  that  nested  relation,  make  a 
side-by-side  copy  to  gain  a  key  and  then  renest  adding  the  key  to  the  nested 
relation.  If  r  is  a  relation  with  arity  n,  then  a  side-by-side  copy  can  be  made 
as  follows: 

^l=n+lA2=n+2A---An=n+nO’’  ^  *") 

The  first  n  attributes  of  this  new  relation  then  serve  as  the  key.  Fewer  at¬ 
tributes  can  be  used  if  they  are  a  primary  key  for  relation  r,  and  the  above 
expression  can  be  projected  to  retain  only  those  attributes  in  the  key  portion. 
Note  that  relations  which  are  in  partitioned  normal  form  already  satisfy  these 
key  constraints. 

We  now  proceed  with  the  inductive  proof. 

Basis:  Zero  operators  in  u>.  Then  u  is  an  atom,  which  we  may  take  to  be  in  one 
of  the  forms  described  in  Chapter  4.  In  order  to  specify  an  algebra  expression 
for  these  atoms,  which  may,  as  themselves,  specify  infinite  relations,  we  need 
to  operate  on  an  expression  D  =  Dti  x  Dl3  x  •  •  •  x  D,n ,  where  the  s,  are  all 
free  tuple  variables  of  the  formula  u;  of  which  this  atom  is  currently  a  part. 

The  atoms  are  thus  translated: 


1.  s  €  r:  Replace  D ,  in  D  by  r. 


2.  s  €  <[*]:  Let  Pi,pt,  ■ . .  ,pt  be  the  attributes  of  D,  in  D,  let  q*  be  the 
*th  attribute  of  Dt  in  Z?,  and  let  qlt  q2, . . .  ,qk  be  the  attributes  of  q*. 
Let  jy  be 

ffpi=}lA  Apt=n  {Pq*  {D)) 

Then  the  desired  expression  is 

kx{°f{D  x  D')) 

where  X  is  the  attributes  of  D  and  F  is  a  predicate  which  matches 
all  attributes  of  D  except  q*  with  the  corresponding  attributes  in 
D'.  By  unnesting  we  can  access  the  elements  of  t[»]  using  standard 
relational  algebra  operators.  In  D '  the  selection  picks  out  those  values 
corresponding  to  tuple  variable  s’s  domain  D,.  This  gives  us  a  set  of 
values  which  we  can  use  to  choose  the  tuples  of  D  which  have  the  sets 
in  t[i]  of  which  s  is  a  member.  The  final  expression  gives  this  result. 

3.  a0s[t],  «[*]  6  a,  s[*‘]  Ot[j\:  Let  p  be  the  tth  attribute  of  D,  and  q  be  the 
j'th  attribute  of  Dt,  then  desired  algebra  expressions  are,  respectively: 

0atp{D)  °p9*{D)  Optq{D ) 

4.  «[*]  =  {u^  |  . . .  ,t„)}:  We  have  s  as  one  of  t1,t2, . . . ,  tn,  and 

j  as  the  arity  of  a  new  tuple  variable  u.  Let  E'  be  an  algebra  expression 


Vw-V-  VVj 


■i  pwr  i  »  v.»  ^  *r.J.".*v.v~.  vy.  ■;%  .'.\’V-*. 


110 

for  ip'  and  fc  be  the  arity  of  D.  The  desired  algebra  expression  is  D 
with  D\  replaced  by 

^  k+l[^k+l=(k+l,k+i,-,  k+i)(E')))- 

Since  E'  is  an  expression  on  DxDu,  the  fc+1  through  k+j  attributes  of 
E 1  will  be  tuples  corresponding  to  u.  Since  this  is  a  nested  expression 
we  apply  the  nest  operation  and  use  this  new  expression  in  place  of 
D\  in  the  expression  for  this  atom. 

Induction:  Assume  u>  has  at  least  one  operator  and  that  the  inductive  hy¬ 
pothesis  is  true  for  all  subformulas  of  rp  having  fewer  operators  than  w.  We 
now  proceed  to  a  case  analysis  covering  each  of  the  three  operators.  Let 
D  =  Dtl  x  A,  x  •••  x  Dtn. 

Case  1:  u>(ti,f2,  •  •  •  ,tn)  =  t2, . . . ,  tn)  V  u2(ti,t2, . . .  ,tn)  where  the  f,-  are 

the  free  tuple  variables  in  the  expression  u.  We  do  not  require  uq  or  w2  to  use 
any  or  all  of  the  t<.  Let  E\  be  an  algebraic  expression  for 

D  ne  {  h,  t2,. . .  ,t„  |  Wi(*i,t2>.  •  •  ,tn)  } 

and  E2  an  algebraic  expression  for 

D  n  { 1 1 » t2, . . . ,  tn  |  w2(tj,  t2, . . . ,  tn)  }. 

Then  the  desired  expression  is 


E[  Ue  E'2. 


Recall  that,  in  Chapter  5,  we  outlined  a  procedure  for  expressing  extended 
union  in  terms  of  the  basic  relational  algebra  operators. 


Ill 


Case  2:  v(ti,t2, . . 
for 


,  £„)  =  — >Wi  (t  i ,  , . .  .,£„).  Let  Ei  be  an  algebraic  expression 

D  n  |  t2>  •  •  • » tn)  } 


then 


D  Ex 


is  an  expression  for 


D  {  f 1>  tj,  .  .  .  , |  Wi(f  x,  f2>  •  •  •  >  tn)  } 
which  is  equivalent  to 

LI  H  {ti,f2,  •  ■  •  j^n  |  f2,  .  .  .  ,tn)  }• 

As  for  case  1,  refer  to  Chapter  5  to  see  that  extended  difference  is  expressible 
in  terms  of  the  basic  relational  algebra  operators. 

Case  3:  w(ti,t2,..  =  (3f„+i)(u>i(ti,f2,...,  f„+i)).  Let  Ei  be  an  algebraic 

expression  for 

D  X  Djb+1  Pi  {  fl,  fj,  •  •  • ,  fn+l  |  Wi(fi,  t2,  •  •  •  j  tn+l)  } 

Since  ip  is  safe  w  is  safe.  The  expression  u>i(ti,t2, . . . ,  £n+1)  is  never  true  unless 
<„+i  is  in  the  set  DOM(ut),  which  is  a  subset  of  DOM  (ip).  Therefore  irj(E  i), 
J  =  the  attributes  of  . . .  ,£„»  denotes  the  relation 

D  fl*  {fi,f2,  •  •  •  >*n  |  (3fn+i)(wi(£i,£2>  •  •  •  >*n+l)  } 


) 
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which  completes  the  induction,  and  proves  the  theorem.  Q 

6.3  Examples 

To  illustrate  Lemma  6-2,  consider  the  following  calculus  expression: 

{t<2>  |  (3s) ((s  €  r  V  s  G  q)  A  s[l]  =  t[l]  A  t[2\  =  { u \  u  €  s[2]  V  u  G  s[ 3]})} 

Assume  that  r  and  q  are  relations  with  three  attributes,  the  second  and  third 
attributes  being  nested  relations  having  two  attributes  each. 

Before  the  marking  phase  of  the  algorithm  the  graph  is  as  shown  in 
Figure  6-1.  During  the  marking  phase,  RC  is  marked.  Then  D],  D],  and  D* 
are  marked  and  the  term  D\  is  not  included  in  the  expression  for  D]t  since  D\ 
is  not  yet  marked.  Then,  /?*,  and  D\  can  be  marked,  and,  finally,  D]  is 
marked  since  all  nodes  at  the  end  of  the  and-edge  are  marked.  The  algebra 
expressions  at  the  end  of  the  marking  phase  are: 

D]  =  7Ti(r)  u  7ri(g) 

Dl  =  **00  Utfj(g) 

D\  =  Jrs(r)  u  *s(g) 

D\  =  D) 

D\  =  x  D\) 

d'u  =  «x(^(d)))  uMMA3)) 


3 


Substituting  for  the  variables  and  applying  the  final  cartesian  products,  we 
have: 

D.  =  (*i(r)  U  tti(9))  x  (tf2(r)  U  ttj (q))  x  (jrs(r)  U  7T3(g)) 

A  =  (»i(r)  U  *1(9))  x  Pks(i,s)((*i(Mi(*i(r)  U  *2(9)))  U  7Ti(/x1(7r3(r)  U  *3(9)))) 
x  (jr2(/ii(jr2(r)  U  ir2(q)))  U  ir2(/ii(n3(r)  U  *3(9))))) 

Du  =  (jti(mi(*2(i-)  u  *2(9)))  U  *i(pi(*s(r)  U  *3(9)))) 

x  (7r2(/*1(7r2(r)  U  *2(9)))  U  w2(Mi(ws(r)  U  *3(9)))) 

For  a  complete  example  of  the  transformation  process  of  Theorem  6-2, 
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consider  the  following  calculus  expression: 

{t^|(3s)(s  e  r  A  t[l]=«[l]  A  t[2]={u^  |  u  6  s[2)  A  u[2]  <  ‘1970’})} 
where  r  is  a  relation  on  R=(course,  Date),  Date=  (month,  year).  This  query  is 
asking  for  all  courses  and  the  set  of  dates  for  the  course  with  a  year  at  most 
1970. 

Using  the  methodology  of  section  6.2  we  translate  this  TRC  expression 
into  an  equivalent  relational  algebra  expression.  We  start  by  transforming  the 
expression  so  that  V,  and  3  are  the  only  operators  present. 

{(<I)|(3a)(-,(^(j  6  r)  V  -.(([1]=»|1])  V 

-.((|2]={»l,)  I  -H<*  £  »[2])  V  (u[2]  >  ‘1970’))}))} 

The  domains  corresponding  to  each  tuple  variable  are 
D,  =  TT^r)  x  7Tj(r), 

Dt  =  7Ti(r)  x  i/i=(i1j)(?ri(Mi(»r2(>')))  *  (»*(Mi(»iW))  U  {(1970)})), 

Du  =  iri(fiiMr)))  x  (ir3((ii{ir2 (r)))  U  {(1970)}). 

We  now  proceed  with  the  translation.  Translate  each  atom: 

s  6  r  — ►  E\  —  Dt  x  r 

t[ll=s(l]  — *  E2  =  cr1=j(Dt  x  D, ) 

t[2]  =  {.  . .}  -*  £3  =  (’Tl^)  X  7r5(t''5=(5,6)(£’)))  X  D* 

where  E'  is  the  algebra  expression  for  {. . .}. 

Translate  negation  and  disjunction: 

EA={DtxD,)  -•  (((DtxDt)  Ei)  U*  ((D,xDt)  E2)  U‘  ((DtxDt)  -e  E»)) 


Translate  existential  quantifier  and  the  final  expression  is: 

E  =  *i,a  (A) 

E'  is  determined  similarly. 

Translate  the  atoms: 

u  e  s[2]  -*■  E^=a4=6A$=7(fi4(Dt  x  Dtx  Du )) 

^i=iri,a,s,4(^i=5 aj=6as=7((A  x  D,x  Du)  x  E ")) 
u[2]  >  *1970’  — ♦  •£/j=06>‘i97O'  (-A  x  Dt  x  Dy) 

Translate  negation  and  disjunction  (and  since  there  are  no  existential  quanti¬ 
fiers)  giving  the  result: 

E'  =  (A  x  A  X  A)  -*  (((A  x  A  X  A)  -*  u‘  £') 

This  ends  the  translation  process.  For  comparison  purposes  the  query 
as  it  would  directly  be  written  in  the  algebra  is 

VDate[0i/ear<’197O>  {l*Date{r))) 

This  assumes  that  the  course  values  are  all  unique  in  r.  If  not  we  would  need 
to  add  a  key  to  the  relation  so  that  the  nest  does  not  combine  sets  that  were 
separate  in  the  beginning. 


Chapter  7 

Null  Values  in  ^1NF  Relational  Databases 

A  problem  may  arise  in  a  ->1NF  representation  of  a  database.  Consider  a 
database  of  employees,  their  children  and  their  skills.  Figure  7-1  shows  an 
example  INF  version  of  this  database,  and  Figure  7-2  shows  the  corresponding 
->1NF  version.  If  we  have  an  employee  with  several  skills  and  no  children,  then, 
in  the  database  of  Figure  7-1,  we  simply  add  tuples  to  the  (employee,  skill) 
relation  and  add  nothing  to  the  (employee,  child)  relation.  Now,  consider  the 
representation  of  this  information  in  the  -<1NF  relation  of  Figure  7-2.  In  this 
relation,  a  tuple  seemingly  requires  that  employees  have  at  least  one  skill  and 
at  least  one  child  before  they  can  be  entered  into  the  database.  The  solution 
is  to  employ  empty  sets.  This  is  the  same  problem  encountered  by  users  of 
a  universal  relation  system  [K+|.  In  the  -ilNF  case,  null  values  can  occur  in 
nested  relations  as  well  as  for  nondecomposable  attributes.  The  empty  set  is, 
in  effect,  a  type  of  null  value. 

The  various  nulls  which  have  been  proposed  vary  in  the  type  of  in¬ 
complete  information  they  represent  or  the  degree  of  the  incompleteness.  For 
example,  we  may  have  different  nulls  to  represent  both  the  non-existence  of  a 
value  and  the  existence  of  a  value  that  is  not  precisely  known.  In  this  chapter 
we  make  the  open  world  assumption.  That  is,  we  assume  that  just  because  a 
tuple  is  not  in  a  relation  does  not  mean  it  should  not  be  there.  The  best  we  can 
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Jones 
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Jones 
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Figure  7-1.  INF  representation  of  employee  relation. 
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Figure  7-2.  Employee  relation  in  -’INF. 
do  at  any  point  in  time  is  enter  tuples  into  a  relation  that  we  know  currently 
belong  there.  In  addition,  if  we  know  partial  information  about  a  tuple  then 
the  unknown  information  is  represented  using  null  values. 


A  different,  although  compatible,  source  of  nulls  occurs  when  we  at¬ 
tempt  to  represent  multiple  relationships  among  data  in  a  single  relation  (an 
extreme  example  being  the  universal  relation  assumption  [FMU]).  For  exam¬ 
ple,  in  a  single  relation  we  may  want  to  represent  facts  about  suppliers,  parts, 
and  associations  stating  which  suppliers  supply  which  parts.  If  a  supplier  is 
currently  not  supplying  a  part,  then  the  part  attributes  of  th^  relation  must 
contain  null  values.  If  null  values  are  not  allowed,  then  a  non-supplying  supplier 
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could  not  be  represented  in  this  relation. 


Thus,  the  same  motivation  which  requires  us  to  add  null  values  to  a 
traditional  INF  database  holds  for  -<1NF  databases.  However,  the  need  for 
nulls  is  even  more  critical  in  a  ->1NF  database  since  otherwise  we  lose  some  of 
its  advantages.  Since  we  have  the  ability  to  represent  multiple  relationships  in 
a  single  -»1NF  relation  without  the  problems  of  redundancy  that  doing  so  in  a 
INF  relation  would  entail,  we  must  also  deal  with  the  fact  that  one  or  more  of 
those  relationships  may  be  unknown  or  non-existent  at  some  time. 

The  remainder  of  this  chapter  is  organized  as  follows.  In  section  7.1, 
we  summarize  a  formal  treatment  of  null  values  in  the  traditional  relational 
model.  The  no-information,  unknown,  and  nonexistent  interpretation  of  nulls 
are  discussed.  We  show  that  reasonable  extensions  to  the  traditional  relational 
operators  are  possible  under  the  open  world  assumption.  These  extensions  serve 
as  a  basis  for  the  main  results  of  this  chapter,  the  extension  to  ->1NF.  In  section 
7.2,  we  extend  the  null  value  theory  presented  in  section  7.1  to  ->1NF  relations, 
and  further  extend  the  operators  of  Chapter  5  to  deal  with  null  values.  Finally, 
in  section  7.3,  we  discuss  dependency  theory,  shedding  some  new  light  on  the 
problem  of  nulls  when  dealing  with  functional  and  multivalued  dependencies, 


and  their  axiomatization. 
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7.1  Null  Values  in  INF  Relations 

In  this  section,  we  briefly  review  the  basic  concepts  that  concern  null  values  in 
INF  relations.  The  presentation  is  based  on  some  of  the  work  of  Zaniolo  [Zanl, 
Zan2].  We  distinguish  between  three  types  of  nulls: 

•  ni  -  no-information, 

•  unk  -  unknown,  and 

•  dne  -  nonexistent  (or  does  not  exist), 

and  extend  each  domain  to  include  these  null  values. 

Previous  approaches  have  usually  assumed  only  one  of  the  interpre¬ 
tations  is  valid,  unknown  by  [Bisl,  Cod2,  Gran,  Mail],  and  nonexistent  by 
[Lie2,  Lie3,  Sci,  Zanl],  In  [Vas2]  a  combination  of  the  two  is  proposed  in  which 
nonexistence  is  considered  an  inconsistent  state  of  data.  Finally,  Zaniolo  [Zan2] 
provides  a  unified  approach  to  nulls  with  the  use  of  a  no-information  null.  This 
null  is  less  informative  than  either  an  unknown  or  a  nonexistent  null,  and  can 
be  used  to  approximate  both  when  we  don’t  know  whether  or  not  a  value  ex¬ 
ists.  As  this  is  the  most  complete  and  conceptually  sound  approach  proposed 
to  date,  it  forms  the  basis  of  our  extensions  to  -ilNF  relations. 

Other  proposals  for  nulls  are  rather  sophisticated,  involving  partial 
specification  (Lipl,  Lip2,  ILl,  IL2],  probability  distributions  [Won],  and  con- 
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ditional  tuples  [KW],  but  it  could  be  argued  “that  the  complexity  of  their 
management  is  not  justified  by  their  richer  semantics”  [AM;  233]. 


7.1.1  Basic  Concepts 

When  dealing  with  incomplete  information,  we  talk  about  a  strength  ordering 
of  information  in  which  certain  tuples  will  be  more  informative  than  others, 
say  by  having  a  previously  unknown  value  replaced  by  an  actual  value,  or  by 
finding  out  that  a  value  for  which  we  previously  had  no-information  is  now 
known  not  to  exist.  In  order  to  compare  values  for  this  purpose  we  define  a 
greatest  lower  bound  function  which  tells  us  the  most  information  we  can  infer 
from  two  values  from  the  same  extended  domain. 

Definition  7.1:  Let  {dj,  dj, . . . ,  dn}  be  a  domain  and  D={  dlt  dj, . . . ,  d„,  unk, 
dne,  ni}  the  corresponding  extended  domain.  A  greatest  lower  bound  function, 
glb(a,b),  between  two  values  a  and  b  from  D  is  defined  in  Figure  7-3. 
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Figure  7-3.  Definition  of  gib  function. 


This  information  can  also  be  represented  as  a  lattice  with  ni  as  the  bottom 
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element,  unk  and  dne  as  more  informative  nulls  than  ni,  and  actual  values 
di,d,2, . . . ,dn  as  more  informative  than  unk.  See  Figure  7-4. 

d\  di  •••  dn  dne 


unk 

I _  I 

ni 

Figure  7-4.  Information  lattice. 

Note  that  the  dne  null  is  special  in  that  it  does  not  have  a  possible, 
more  informative,  replacement.  It  is,  in  fact,  a  special  “value”  in  itself,  for  which 
equality  is  meaningful.  That  is,  dne  =  dne,  but  ni  ^  ni  and  unk  ^  unk. 
Other  restrictions  on  relations  with  dne  nulls  will  be  discussed  in  section  7.3. 

We  now  define  an  information-wise  strength  ordering  of  tuples  using 
the  gib  function  as  follows: 

Definition  7.2:  An  X-value  s  is  said  to  be  more  informative  than  a  y-value 
t,  written  s  >  t,  if  for  each  B  G  Y,  if  t\B ]  is  not  ni  then  B  6  X,  and  for  each 
A  E  X  n  Y,  glb{t{A],s[A})  =  t\A}. 


Conversely,  if  s  >  t  we  say  that  t  is  less  informative  than  s.  The 
notion  of  more  informative  is  synonymous  to  the  concept  of  subsumption.  We 
say  s  subsumes  t  when  s  >  t.  If  we  have  two  tuples  in  a  relation  such  that  one 
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is  more  informative  than  the  other,  then  the  less  informative  tuple  is  redundant 
and  can  be  removed.  Note  that  in  the  absence  of  nulls,  this  condition  reduces 
to  elimination  of  redundant  identical  tuples.  If  both  t  >  s  and  s  >  t,  then  we 
say  t  and  s  are  information-wise  equivalent  and  write  s  =  t. 

As  a  running  example  in  this  section,  we  use  relation  schemes  Rx  = 
(employee, skill),  and  Rj  =  (employee, child, skill). 

Example  7.1:  Let 

ti  =  (Smith,  Bill,  typing) ,  1 2  =  (Smith,  ni,  unk) 

denote  Er2- values,  and  let 

t3  =  (Smith,  unk),  t4  =  (Smith,  typing) 
denote  E^-values.  Then,  t%  is  more  informative  than  t2,  t3,  and  t4.  Further¬ 
more,  t4  >  t2,  t4  >  t3,  and  i2  =  t3.  □ 

For  certain  relational  operators  it  is  convenient  that  all  tuples  be  de¬ 
fined  over  the  same  set  of  attributes.  With  the  availability  of  a  no-information 
null  we  can  extend  tuples  defined  over  different  sets  of  attributes  without  chang¬ 
ing  the  information  content  of  the  tuples.  The  extension  is  done  by  adding 
attributes  used  in  one  tuple  and  not  in  the  other  and  assigning  the  value  ni  to 
these  added  attributes. 

In  order  to  find  the  most  informative  tuple  which  characterizes  two 
other  tuples  we  define  the  meet  operator  as  follows: 


jaa.<wai«i ^  j.  „ 


123 


Definition  7.3:  The  meet  of  an  X-value,  tu  and  a  F-value,  f2,  is  the  XY- 
value,  t,  written,  ti  A  f2,  where  for  each  attribute  A  e  X  n  Y,  t\A }  = 
glb(ti[A],t2[A]),  and  for  each  attribute  B  X  n  Y,  t[B]  =  ni. 

Example  7.2:  Using  the  tuples  defined  in  Example  7.1  we  find  that 

t\  A  t2  =  f2 

ti  A  t4  =  (Smith,  ni,  typing) 

a 


We  also  generalize  the  notion  of  a  tuple  being  an  element,  or  a  member 
of  a  relation  as  follows. 

Definition  7.4:  A  tuple  t  is  an  x-element  of  a  relation  r,  written  t  e  r,  when 
there  exists  a  tuple  3  6  r  such  that  s  >  t. 

Thus  an  x-element  of  a  relation  is  any  tuple  that  is  equal  to  or  less 
informative  than  some  tuple  in  the  relation.  We  also  write  t  £  r  to  denote 
-(ter). 

Given  a  set  of  tuples  fi,  t2, . . . ,  tn,  we  can  eliminate  tuples  in  which  all 
attributes  have  value  ni  (the  null  tuple)  f»  eliminate  all  tuples  less  informative 
than  some  other  tuple,  and  extend  all  tuples  by  adding  ni  values  for  attributes 
not  in  the  tuple  but  in  some  other  tuple  in  the  set.  This  is  called  tuple  set 
reduction  and  is  denoted  by 

A  A 

{t i,  t2,  .  .  .  ,  fn} 

f  Even  though  a  null  tuple  is  subsumed  by  all  tuples,  it  may  be  the  only  tuple  in  a 
relation,  and  thus  should  be  eliminated. 


The  notion  of  being  more  informative  can  be  extended  to  relations. 

Definition  7.5:  A  relation  ri  is  more  informative  than ,  or  subsumes ,  a  relation 
r2,  written  rx  >  r2,  when  for  each  tuple  f2  G  r2  there  is  a  tuple  t\  G  ri  with 
fi  >  t2. 

This  >  relationship  is  transitive  and  reflexive,  leading  to  the  following 
definition  of  information-wise  equivalence. 

Definition  7.6:  The  relations  r i  and  r2  are  information-wise  equivalent ,  writ¬ 
ten  ri  =  r2,  when  rx  >  r2  and  r2  >  ri. 

The  equivalence  relation  =  partitions  the  universe  of  relations  into 
disjoint  subclasses.  Each  class  can  be  represented  by  a  minimal  relation  in 
which  no  tuples  in  the  relation  are  subsumed  by  a  tuple  in  the  same  relation. 

Definition  7.7:  A  relation  r  constitutes  a  minimal  representation  for  a  relation 
q  when  r  C  q,  r  =  q,  and  ftp  c  r  such  that  p  =  q. 

It  is  straightforward  to  show  that  the  minimal  representation  of  a  relation  is 
unique  and  therefore  minimum. 

7.1.2  Operators 

In  this  section  we  briefly  review  extensions  to  the  relational  algebra  operators 
to  INF  relations  with  nulls.  We  treat  the  dne  null  as  any  other  domain  value 
and,  unless  otherwise  specified,  any  future  reference  to  null  will  include  only  ni 
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and  unk  nulls.  Some  of  this  presentation  is  based  on  Section  12.4  of  [Mai2]. 

Let  Rel]  denote  the  sets  of  all  relations  having  at  least  one  null  value 
and  let  Rel  denote  the  set  of  all  relations  having  no  nulls,  with  Rel](R)  and 
Rel(R)  denoting  restrictions  of  Rel]  and  Rel  to  relations  on  scheme  R.  We 
shall  view  a  relation  r  in  Rel](R)  as  representing  a  set  of  relations  from  Rel(R) 
that  subsume  r.  Each  such  relation  in  Rel(R)  is  called  a  possibility.  The  set  of 
possibilities  for  r  is  denoted  by  POSS(r),  which  is  defined  as: 

POSS(r)  =  {q\q£  Rel(R)  and  q  >  r} 

We  extend  the  definition  of  relational  operators  to  map  sets  of  relations  to  other 
sets  of  relations.  For  sets  Pi  and  Pj  of  relations  and  relational  operator  7, 

7(Pi)  =  {7(?)  |  9  6  Pi}  and 

Pi  7  Pi  =  {9i  7  92  |  9i  e  Pu  9a  e  p2}. 

We  now  discuss  what  constitutes  a  reasonable  extension  of  a  relational 
operator  relative  to  this  possibility  function.  However,  first,  we  want  the  gen¬ 
eralized  operator  to  agree  with  the  regular  operator  on  Rel  without  regard  to 
the  possibility  function. 

Definition  7.8:  Let  7  be  an  operator  on  Rel  and  let  7'  be  an  operator  on 
Rel]  U  Rel.  We  say  that  7'  is  faithful  to  7  if  one  of  the  following  two  conditions 
holds: 

1.  when  7  and  7'  are  unary  operators,  7(f)  =  7f(r)  for  every  r  €  Rel  for 


which  7(r)  is  defined. 


2.  when  7  and  7'  are  binary  operators,  r  7  q  =  r  V  q  for  every  r,?6 
for  which  r  7  q  is  defined. 

Second,  we  would  ideally  like  our  generalized  operator  to  give  us  the 
same  set  of  possibilities  as  the  standard  operator. 

Definition  7.9:  Let  7  be  an  operator  on  Rel  and  let  7'  be  an  operator  on  Rel], 
We  say  that  7'  is  a  precise  generalization  of  7  relative  to  possibility  function 
POSS  if  one  of  the  following  two  conditions  holds: 

1.  when  7  and  7'  are  unary  operators,  POSS( 7*(r))  =  7 (POSS(r))  for 
every  r  G  Rel f. 

2.  when  7  and  7'  are  binary  operators,  POSS{r  7'  q)  =  POSS[r)  7 
POSS(q)  for  every  r,q  G  -Reft- 

Unfortunately,  not  all  relational  operators  have  a  precise  generalizar 
tion  relative  to  POSS.  Consider  a  join  operator  for  POSS.  It  cannot  be 
precise.  For  relations  r  G  Rel](R)  and  q  G  Rel][Q),  POSS(r)  ex  POSS(q)  is 
subset  of  SAT{tx  [R, Q]).  But,  for  some  relation  p  G  Rel^(RQ),  POSS(p)  is 
not  a  subset  of  SAT{ tx  [JE,  Q]).  In  these  cases,  we  settle  for  a  generalization 
of  7  that  captures  everything  in  7 (POSS(r))  or  POSS(r)  7  POSS(q)  and  as 
little  extra  as  possible. 

Definition  7.10:  Let  7  be  an  operator  on  Rel  and  let  7'  be  an  operator  on 
Rel].  We  say  that  operator  7'  is  adequate  for  7  relative  to  possibility  function 


127 


POSS  if  one  of  the  following  two  conditions  holds: 

1.  when  7  and  7'  are  unary  operators,  POSS(i'(r))  D  ~f(POSS(r))  for 
every  r  6  Rel f. 

2.  when  7  and  7'  are  binary  operators,  POSS(r  7*  g)  D  POSS(r)  7 
POSS(q)  for  every  r,g  6  Pel|. 

Furthermore,  we  say  that  operator  7'  is  restricted  for  7  relative  to  POSS  if  one 
of  the  following  two  conditions  holds: 

1.  when  7  and  7*  are  unary  operators,  for  every  r  6  Pelt,  there  is  no  p 
in  Pelt  such  that  POSStf{r))  $  P055(p)  D  7 (P055(r)). 

2.  when  7  and  7'  are  binary  operators,  for  every  r,q  £  Pe/t,  there  is 
no  p  in  Pe/t  such  that  POSS[r  V  g)  $  POSS(p)  D  POSS{r)  7 
PO  55(g). 

Clearly,  if  7'  is  precise  for  7,  then  7'  is  adequate  and  restricted  for  7. 
We  would  also  like  the  generalized  operators  to  have  properties  that  the  stan¬ 
dard  operator  possesses,  such  as  commutativity  or  associativity.  For  example, 
if  7  is  an  associative  binary  operator,  we  want  a  generalization  7'  to  satisfy 

(p  7 '  9)  V  *■  =  p  7'  (?  V  r) 

for  p,g,r  £  Pelt.  Finally,  we  would  like  the  generalized  operators  to  return 
only  minimal  relations  given  minimal  relations  as  input. 


Figure  7-5.  Some  sample  relations. 

We  now  present  generalizations  for  the  standard  operators,  called  null- 
union,  null-difference,  null-product,  null-select,  and  null-project  (denoted  U', 
— \  x',  o',  and  it1,  respectively),  which  are  faithful,  and  at  least  adequate  and 
restricted,  if  not  precise.  Some  sample  relations  are  shown  in  Figure  7-5.  These 
will  be  used  to  illustrate  the  new  operators. 


7. 1.2.1  Null-union 


The  null-union  of  two  relations  r  on  scheme  R  and  q  on  scheme  Q  in  Rel  U  Rel T 
is  a  relation  p  on  scheme  P  where: 


1.  Ep  =  Er  U  Eq,  and 

2.  p  =  r  U'  q  =  {t  1 1  G  r  or  t  €  qj  =  {f  1 1  G  r  or  t  e  q}. 

Some  examples  of  null-union  are  shown  in  Figure  7-6. 

Proposition  7.1:  The  operator  null-union  is  faithful  to  standard  union. 

Proof:  The  only  difference  between  the  definition  of  null-union  and  standard 
union  is  that  tuple  set  reduction  is  applied  to  the  result  of  a  null-union  oper- 
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rx  U'  ra  rj  U'  r3 


Figure  7-6.  Examples  of  null-union. 

ation.  However,  since  we  are  dealing  with  relations  in  which  there  are  no  null 
values,  this  extra  operation  makes  no  changes.  Thus,  null-union  is  faithful  to 
standard  union.  □ 

Proposition  7.2:  The  operator  null-union  is  a  precise  generalization  of  stan¬ 
dard  union  with  respect  to  possibility  function  POSS. 

Proof:  We  show  inclusion  both  ways.  Let  p  =  r  U'  q. 

D  Let  p  G  POSS(r)  U  POSS(q).  There  must  be  r  G  POSS(r)  and 
q  G  POSS{q )  such  that  p  =  r\jq.  Let  tp  be  a  tuple  in  p.  Either  tp  G  r 
or  tp  G  q.  If  tp  G  r,  there  is  a  tuple  tjGf,  and  hence  in  p,  such  that 
tp>  tp.  A  similar  argument  holds  if  tp  G  q.  We  conclude  p  >  p  and 
so  p  G  POSS(p).  Therefore,  POSS(p)  D  POSS{r)  U  POSS{q). 

C  Let  p  G  POSS(p).  Since  p  >  r,  p  >  r  and  so  p  G  POSS(r).  Sim¬ 
ilarly,  p  G  POSS(q).  Therefore,  p  G  POSS(r)  U  POSS(q),  and  so 
POSS{p)  C  POSS(r)  U  POSS(q). 


av.  v.  v: 


We  conclude  that  null-union  is  a  precise  generalization  of  standard  union  for 
POSS.  □ 

7.1. 2. 8  Null-difference 

The  null-difference  of  two  relations  r  on  scheme  R  and  q  on  scheme  Q  in  Rel  U 
Rcl\  is  a  relation  p  on  scheme  P  where  Ep  =  Er  U  Eq  and 

p  =  r—'q  =  {t\t(Er  and  t(fcq}  =  {t\tGr  and  Vs  e  q  :  ->(s  >  t)}. 

The  definitions  of  null-union  and  null-difference  were  first  proposed 
by  Zaniolo  [Zan2],  who  showed  the  given  equivalences.  The  second  equality  is 
preferable  as  €  implies  a  combinatorial  explosion  in  generated  tuples  which  are 
subsequently  removed  by  tuple  set  reduction,  and  that  tuple  set  reduction  is  not 
needed  for  difference  as  we  assume  the  input  relations  are  minimal. 

Some  examples  of  null-difference  are  shown  in  Figure  7-7.  Null- 
difference  is  a  faithful,  and  adequate  and  restricted  generalization  of  stan¬ 
dard  difference.  To  show  that  null-difference  is  not  precise,  consider  a  relar 
tion  r  with  one  non-null  tuple  and  an  empty  relation  q.  Every  relation  in 
POSS{r  9)  must  subsume  r,  whereas  POSS[r)  —  POSS(q)  is  empty.  Thus, 
POSS{r  -• q )  ±  POSS(r)  -  POSS{q). 

Proposition  7.3:  The  operator  null-difference  is  faithful  to  standard  differ- 
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*•1  -'  rj  r2  -'  ri 


employee 

skill 

Smith 

typing 

Jones 

filing 

Adams 

ni 
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skill 

unk 

dictation 

ni 

clerk 

Figure  7-7.  Examples  of  null-difference. 

Proof:  When  there  are  no  null  values  then  the  only  way  for  one  tuple  to 
subsume  another  is  for  them  to  be  identical.  Thus,  in  the  definition  of  null- 
difference  the  statement  Vs  G  q  :  -’(a  >  t)  reduces  to  Vs  €  q  :  ->(5  =  t)  which  is 
equivalent  to  ->{t  G  q).  With  this  reduction,  we  have  the  standard  definition  of 
difference.  Thus,  null-difference  is  faithful  to  standard  difference.  □ 


Proposition  7.4:  The  operator  null-difference  is  an  adequate  and  restricted 
generalization  of  standard  difference  with  respect  to  possibility  function  POSS. 


Proof:  We  show  adequate  and  then  restricted. 


adequate:  POSS(r  -•  q)  D  POSS{r)  -  POSS(q). 

Let  p  =  r  — '  q,  and  p  G  POSS(r)  -  POSS(q).  Then,  p  G  POSS(r).  Let 
tp  be  a  tuple  in  p.  Then,  tp  must  be  in  r.  Therefore,  there  is  a  tuple 
ty  G  p,  such  that  >  tp.  We  conclude  that  p  >  p  and  so  p  G  POSS(p). 
Therefore,  POSS(p)  D  POSS(r)  -  POSS(q). 

restricted:  there  does  not  exist  p  such  that  POSS[r  —  q)  $  POSS(p)  D 
POSS{r)  -  POSS(q). 


Suppose  there  is  some  p.  If  POSS(r  q)  $  POSS(p),  then  there  must 
be  some  tuple  t  in  p  that  does  not  subsume  any  tuple  in  r  — '  q.  This  means 
that  the  non-null  valued  attributes  X  of  t  do  not  match  any  tuple  on  X 
in  r  —  q.  There  are  two  possible  reasons  for  this:  either  i[X]  €  r[X]  and 
3s  6  q  :  s  >  t,  or  t[X j  £  t[X\.  In  each  case,  any  relation  in  POSS(p)  must 
contain  a  tuple  which  subsumes  t,  however,  POSS(r)  -  POSS(q)  contains 
a  relation  which  does  not.  In  the  first  case,  t’s  possibility  can  be  eliminated 
by  the  possibility  of  s  in  q  that  subsumes  it,  and  in  the  second  case,  simply 
consider  the  possibilities  of  r  that  do  not  include  a  tuple  which  subsumes 
t.  Therefore,  POSS(p)  2  POSS(r)  —  POSS(q),  which  is  a  contradiction. 

We  conclude  that  null-difference  is  an  adequate  and  restricted  generalization 
of  standard  difference  for  POSS.  □ 

We  note  that  a  generalized  null-intersection  operator  is  not  derivable 
from  null-difference  alone.  Figure  7-8  shows  that  the  usual  equivalence 

rx  n'  r2  =  ri  (r2  r2)  =  r2  (r2  rj) 

does  not  hold.  However,  as  pointed  out  in  [Zan2],  the  following  more  symmetric 
definition  of  intersection  in  terms  of  union  and  difference  does  carry  forward  to 
the  null  generalizations. 

n  (Y  r2  =  (r2  U'  r2)  ((rx  r2)  U'  (r2  rj)) 

This  result  is  also  shown  in  Figure  7-8.  Note  that  r\  —  r2,  r2  —  rit  and  fY  r2 
now  appropriately  partition  U'  r2  just  as  the  standard  operators  do.  We 
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(ri  U*  r2)-'  ri  (f!  -*  r2) 

((»"i  —  r2)  U'  (r2  — '  r2))  J  employee  skill 

ni  dictation 

Jones  typing 

Figure  7-8.  Examples  of  null-intersection, 
note  also  that  null-intersection  is  an  adequate  and  restricted  generalization  of 
standard  intersection  for  POSS. 

7.1.2.S  Null-product 

The  null-product  of  two  relations  is  identical  to  the  standard  (cartesian)  prod¬ 
uct,  as  no  values  are  checked  in  the  process.  Let  r  G  Rel(R)  u  Rel^R)  and 
q  G  Rel(Q)  U  Rell(Q),  with  Er  n  Eq  =  0.  Then  the  null-product  of  r  and  q  is 
defined  as  follows: 

r  x'  q  =  {t|3ir  G  r  and  3 tq  G  =  tr ,  and  t[Eg]  =  tq} 

Null-product  is  obviously  a  faithful  and  precise  generalization  of  standard  prod¬ 
uct. 

7. 1.2. 4  Null-select 

Selection  of  tuples  comes  in  two  flavors,  comparison  of  an  attribute  value  against 
a  non-null  constant  and  comparison  of  one  attribute  value  against  another.  Let 
r  G  Rcl(R)  U  Rel]{R)  and  let  A  G  Er.  Null  selection  is  defined  as  follows: 

aAfair)  =  0  I t  £  r  and  t[A]9a} 

°A0B(r)  =  {t  |  t  G  r  and  t[A)6t[B\} 


ra  -*  (ra  -  ri) 

I  employee  I  skill 


Smith 

ni 

Jones 

typing 

where  B  is  =  or  <.  Recall  that  null  values  are  not  equal  to  each  other  or  to  any 
other  domain  value,  and  with  this  stipulation  null-select  is  essentially  identical 
to  standard  select. 
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Null-select  is  faithful  to  standard  select,  but  it  is  not  precise.  For 
<7^_a,  note  that  for  any  relation  q  E  OA=a{POSS(r))t  every  tuple  t  E  q  has 
f[A]  =  a.  For  any  relation  p,  POSS(p)  contains  relations  whose  tuples  are  not 
all  a  on  A.  However,  the  definitions  are  adequate  and  restricted. 

Proposition  7.5:  The  operator  null-select  is  faithful  to  standard  select. 

Proof:  As  the  definitions  of  null-select  and  select  are  identical  when  no  null 
values  are  present,  the  result  follows  immediately.  □ 

Proposition  7.6:  The  operator  null-select  is  an  adequate  and  restricted  gen¬ 
eralization  of  standard  select  with  respect  to  possibility  function  POSS. 

Proof:  We  show  adequate  and  then  restricted.  Let  F  be  any  selection  predi¬ 
cate. 

adequate:  POSS(</F(r))  D  aF(POSS(r)). 

Let  p  =  <rj?(r),  and  p  E  erF(POSS(r)).  There  must  be  f  €  POSS(r) 
such  that  p  =  o>(r).  Let  tp  be  a  tuple  in  p.  Then,  tp  must  be  in  r 
and  satisfy  F.  Therefore,  there  is  a  tuple  t~  E  r,  such  that  t~  >  tp ,  and 
satisfies  F.  We  conclude  that  p  >  p  and  so  p  E  POSS(p).  Therefore, 
POSS{p)  D  aF{POSS{r)). 
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restricted :  there  does  not  exist  p  such  that  POSS(oF(r))  2  POSS(p)  D 
oF{POSS(r)). 

Suppose  there  is  some  p.  If  POSS(ajs.(r))  2  POSS(p),  then  there  must 
be  some  tuple  t  in  p  that  does  not  subsume  any  tuple  in  cr'F  (r) .  This  means 
that  the  non-null  valued  attributes  X  of  t  do  not  match  any  tuple  on  X 
in  o'F (r) .  There  are  two  possible  reasons  for  this:  either  t[X]  G  r[X]  and  t 
does  not  satisfy  F,  or  t\X]  &  r[X].  In  each  case,  any  relation  in  POSS[p) 
must  contain  a  tuple  which  subsumes  t,  however,  aF(POS S(r))  contains 
a  relation  which  does  not.  In  the  first  case,  t’s  possibility  is  eliminated 
by  applying  the  selection  predicate,  and  in  the  second  case,  simply  con¬ 
sider  the  possibilities  of  r  that  do  not  include  a  tuple  which  subsumes  t. 
Therefore,  POSS[p)  2  0r[POSS(r)),  which  is  a  contradiction. 

We  conclude  that  null-select  is  an  adequate  and  restricted  generalization  of 
standard  select  for  POSS.  □ 

7. 1.2.5  Null-project 

While  standard  projection  eliminates  duplicate  tuples  from  the  reduced  re¬ 
lation,  null-projection  eliminates  less  informative  tuples.  Let  r  G  Rel(R)  U 
Rell(R)  and  let  Ai,A2,...,An  G  Er.  Null-project  is  defined  as  follows: 

(*")  =  •  •  •  An]  I  t  G  r} 

Examples  of  null-project  are  shown  in  Figure  7-9. 
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Figure  7-9.  Examples  of  null-project. 

Proposition  7.7:  The  operator  null-project  is  faithful  to  standard  project. 

Proof:  As  in  the  proof  of  Proposition  7.1,  without  null  values,  tuple  set  re¬ 
duction  has  no  affect  on  the  result  of  the  relation,  making  the  definitions  of 
null-project  and  standard  project  identical.  □ 

Proposition  7.8:  The  operator  null-project  is  a  precise  generalization  of  stan¬ 
dard  project  with  respect  to  possibility  function  POSS. 

Proof:  We  show  inclusion  both  ways.  Let  p  =  7r^(r),  where  X  are  the  at¬ 
tributes  being  projected. 

D  Let  p  G  irx{POSS(r)).  There  must  be  r  G  POSS(r)  such  that  p  = 
7rx(r).  Let  tp  be  a  tuple  in  p.  Then,  tp  G  r[X].  Since  tp  G  r[X],  there 
is  a  tuple  G  r[X],  and  hence  in  p,  such  that  >  tp.  We  conclude 
p  >  p  and  so  p  G  POSS(p).  Therefore,  POSS(p)  D  TTx{POSS(r)). 


C  Let  p  G  POSS(p).  Consider  each  tuple  tp  G  p.  Each  tp  is  in  r[X] 
and  possibly  eliminated  some  other  tuples  in  r[X]  since  tp  subsumed 
them.  We  then  construct  the  following  relation  in  POSS(r ):  for  each 


i^eraffosaciErsorCTggg CBtagragrr:  -*:1  =*ryr  «>,  y;  v* /«■»»■.,  a-j  <Pi.r/Tv  v.  r-jr-m*/  www  n»m 
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tuple  in  r  whose  projection  is  tp  and  the  tuples  in  r  whose  projection  it 
subsumes,  make  the  same  assignment  to  the  null  values  in  attributes 
X  that  was  made  in  constructing  p.  By  making  the  same  assignment, 
in  the  projection  irx(POSS(r )),  the  tuples  which  were  subsumed  in 
the  null-project  will  be  duplicates  and  thus,  eliminated  in  both  cases. 
Therefore,  p  6  7r *(POSS(r)),  and  so  POSS(p)  C  ^(POSS^r)). 

We  conclude  that  null-project  is  a  precise  generalization  of  standard  project  for 
POSS.  □ 

7 .1.2.6  Join 

As  in  the  case  of  the  standard  operators,  the  various  0-joins  can  be  defined  as 
selections  on  a  cartesian  product.  In  our  case, 

r AtB9  =  a'AeB^  X 

As  in  [Zan2,  LaP],  the  use  of  null  values  allows  the  definition  of  new 
information-preserving  joins  (also  called  outer  joins)  which  include  tuples  that 
normally  do  not  participate  in  the  join.  An  information-preserving  equijoin  is 
defined  by 

(r  >^B  q)  U'  r  U'  q. 

Figure  7-10  shows  an  example  of  the  information-preserving  equijoin  of  r2  and 

r4. 


** *.VW ,  saV .y.  i’. 


jLjC.  L.  V.  /i 


r2. employee 

skill 
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child 
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Smith 
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Smith 
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Sue 

Jones 

typing 

Jones 

ni 

ni 

clerk 

ni 

ni 

unk 

dictation 

ni 

ni 

ni 

ni 

unk 

Joe 

Figure  7-10.  Information  preserving  equijoin. 

7.2  Introducing  Null  Values  into  the  -.INF  Relational 
Model 

Previous  research  on  nulls  in  -.INF  relations  has  been  either  ambiguous  or 
incompletely  treated.  One  source  of  concern  is  the  effect  of  the  unnest  operator 
on  empty  sets.  As  defined  in  Chapter  4  unnest  produces  a  flatter  relation 
structure  with  each  element  of  the  unnested  set  forming  a  value  in  a  separate 
tuple  in  the  flatter  relation.  When  the  set  is  empty,  it  is  not  clear  what  this 
operation  means.  Schek  states,  “In  the  general  case  unnest  on  empty  relations 
will  produce  undefined  attribute  values”  [Sch2;180].  However,  if  the  empty 
set  has  a  meaning  in  the  relation,  then  whatever  it  unnests  to  should  have 
meaning  also.  In  the  VERSO  model  [ABl],  empty  sets  are  used  as  null  values 
for  set-valued  attributes.  However,  nulls  are  not  allowed  for  atomic-valued 
attributes.  Thus,  when  an  empty  set  is  unnested  the  entire  tuple  is  deleted 
from  the  resulting  relation. 


Two  researchers  have  assigned  the  non-existent  interpretation  to 
empty  set.  One  of  Makinouchi’s  properties  of  “not-necessarily-normalized” 
relations  is  that  “A  null  set  (0)  may  be  in  the  domain  of  a  relation  col¬ 
umn.  0  means  exactly  non-existence”  [Mak;448|.  In  deriving  an  extended 
set-containment  operation  for  INF  relations  with  non-existent  nulls,  Zaniolo 
[Zanl]  discusses  the  -ilNF  viewpoint.  In  this  development,  he  assigns  the  non¬ 
existence  meaning  to  the  empty  set,  viewing  the  non-existent  null  as  the  image 
of  an  empty  set  when  mapping  from  an  unnormalized  relation  to  a  normalized 
one. 

We  believe  that  the  correct  interpretation  for  empty  set  is  the  no¬ 
information  one.  We  have  already  seen  in  the  definition  of  tuple  set  reduction 
that  the  null  tuple  is  eliminated  from  any  relation  even  if  it  is  the  only  tuple 
in  the  relation.  So,  in  the  simplest  case  of  a  relation  with  one  attribute,  we 
have  that  the  empty  relation  is  equivalent  to  the  relation  containing  only  the 
tuple  (ni).  This  is  consistent  with  the  open  world  assumption  we  have  been 
making  in  which  we  do  not  assume  that  the  empty  relation  indicates  that  no 
tuples  belong  in  the  relation  but  that  we  currently  have  no  information  about 
the  world  and  so  we  do  not  know  if  the  tuples  belong  or  not.  As  we  will  see, 
this  means  an  empty  nested  relation  should  unnest  to  a  no-information,  null 
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7.2.1  Basic  Concepts 

When  nulls  are  introduced  into  our  model,  the  concept  of  more  informative 
(or  subsumes)  must  be  extended  to  handle  nested  relations.  The  main  idea  is 
to  treat  nested  relations  as  values  which  must  be  more  informative  than  the 
corresponding  nested  relation  in  the  less  informative  tuple.  In  addition,  a  null 
tuple  which  consists  of  all  ni  values  in  the  INF  model  is  extended  in  the  -<1NF 
model  so  that  all  zero  order  attributes  have  ni  values  and  all  higher  order 
attributes  are  empty  or,  equivalently,  contain  exactly  one  null  tuple.  Thus,  our 
new  definition  of  more  informative ,  which  includes  the  old  one  as  a  special  case, 
is  as  follows. 

Definition  7.11s  Let  t\  be  a  tuple  on  zero  order  attributes  X2  and  higher 
order  attributes  Yi,  and  let  t2  be  a  tuple  on  zero  order  attributes  X2  and 
higher  order  attributes  Y2.  The  tuple  t\  is  said  to  be  more  informative  than 
the  tuple  t2  when: 

(a)  for  each  BeX j,  if  t2\B\  is  not  ni  then  B  G  Xu 

(b)  for  each  C  6  Yj,  if  t2\C]  contains  a  tuple  that  is  not  null  then 
ce  Yi, 

(c)  for  each  A  6  Xi  fl  X2i  =  t2\A\y  and 

(d)  for  each  D  6  Yi  fl  Y2  and  tuple  u2  e  t2\D\,  there  exists  some 
tuple  Ui  €  which  is  more  informative  than  u2. 
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Smith 

Sam 

2/10/84 

typing 
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1/20/85 
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filing 
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1975 
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Austin 

typing 

1962 

Waco 

Figure  7-11.  A  sample  relation  on  the  Emp  scheme. 

Example  7.3:  Recall  the  Emp  scheme  and  sample  relation  (shown  again  in 
Figure  7-11)  introduced  in  Chapter  5.  If  a  new  employee,  say  Jones,  is  added 
to  the  database  and  we  do  not  know  anything  about  him  except  his  name,  then 
we  would  add  the  tuple  {  Jones,  {},  {}  ),  or,  equivalently,  (  Jones,  {(  ni,  ni)}, 
{(  ni,  {(  ni,  ni )}}  ).  If  we  find  out  later  that  Jones  has  no  children  and  has 
some  skill  for  which  he  took  a  1981  exam,  we  could  update  the  tuple  to  (  Jones, 
{dne,dne},  {(  unk,  {(  1981,  unk  ))} ).  □ 

There  is  an  aspect  of  our  definition  of  more  informative  which  goes 
beyond  nulls.  Consider  the  following  tuple 

(  Smith,  {(  Sam,  2/10/84  )},  {(  ni,  {(  ni,  ni )}}  ). 


According  to  Definition  7.11,  this  tuple  is  less  informative  than  the  one  in  Figure 
7-11.  Note  that  the  Children  attribute  in  the  original  “Smith”  tuple  is  a  nested 
relation  with  two  tuples  while  in  the  new  tuple  only  one  of  the  Children  tuples 


exists.  This  reasoning  stems  from  our  interpretation  of  the  relationship  between 
the  attributes  in  -ilNF  relations.  Nested  relations  are  not  non  decomposable 
values,  so  that  it  is  the  tuples  of  the  nested  relation  that  are  related  to  the  other 
attributes.  Thus  an  employee  is  related  to  each  child  and  there  is  no  particular 
significance  to  sets  of  children.  Similar  reasoning  about  the  significance  of  sets 
led  to  our  definition  of  PNF.  However,  the  requirement  of  PNF  is  a  somewhat 
different  notion  than  that  of  subsumption,  as  the  following  example  shows. 

Example  7.4:  Let  tx  =  (  Smith,  {(  Sam),(  Sue)})  and  t2  =  (  Smith,  {(  Sue),( 
Bill )}  )  be  tuples  from  a  projected  employee  relation.  We  have  that  tx  ^  t2 
and  tt  ^  ii,  but  under  PNF  t\  and  t2  would  be  combined  into  t8  =  (  Smith,  {( 
Sam ),(  Sue),  (  Bill )}  ).  □ 

The  definitions  of  x-element  (G),  and  tuple  set  reduction 

A  A 

({set  of  tuples}),  from  section  7.1,  carry  over  to  ->1NF  in  a  straightforward 
manner.  However,  the  meet  of  two  -ilNF  tuples  must  be  extended  to  han¬ 
dle  nested  relations.  This  can  be  done  using  the  gib  function  for  zero  order 
attributes  and  applying  the  definition  recursively  for  higher  order  attributes. 

Definition  7.12:  Let  U  be  the  attributes  on  which  two  tuples  tx  and  t2  are 
defined,  where  tx  and  t2  have  been  extended  to  U  with  the  addition  of  ni 
values  for  zero  order  attributes  and  single  null  tuple  relations  for  higher  order 
attributes,  if  necessary.  A  tuple  t  is  the  meet  of  tx  and  t2,  written  fx  A  f2,  when 
for  each  zero  order  attribute  A€  U,  t[A\  =  fif/6(fi[A],  and  for  each  higher 


order  attribute  X  G  U,  f[X]  =  {s  A  u  |  s  G  fi[X]  and  u  6  ij[X]}. 

Finally,  the  ideas  of  more  informative  relations,  information-wise 
equivalence  and  minimal  representations  for  a  relation  all  have  the  same  defi¬ 
nitions  when  we  substitute  the  ->1NF  version  of  subsumption. 

7.2.2  Operators  for  ->1NF  Relations  with  Nulls 

Since  the  mapping  between  INF  and  ->1NF  relations  is  an  important  one,  we 
need  to  revise  the  definitions  of  ntst  and  unnest  to  deal  with  the  presence  of  null 
values.  For  nest,  we  deal  with  the  problem  of  null  values  for  the  partitioning 
attributes  (the  attributes  not  being  nested),  and  for  unnest  we  deal  with  sub¬ 
sumption  and  possible  loss  of  information.  Once  this  is  dealt  with,  we  provide, 
where  possible,  precise  extensions  to  the  -ilNF  operators  defined  in  Chapter  5 
accommodating  null  values.  Once  again  we  will  work  only  on  relations  in  PNF. 
However,  our  definition  of  PNF  relies  on  the  definition  of  functional  dependency 
in  which  we  test  equality  of  attribute  values,  and  therefore,  we  need  to  specify 
how  null  values  should  be  treated.  For  purposes  of  testing  for  equality,  ni  ^  ni, 
unk  #  unk,  and  dne  =  dne.  The  intuition  behind  this  will  be  discussed  in 
what  follows. 

7 .2.8.1  Null-nest 

When  null  values  occur  as  values  of  attributes  which  are  being  nested,  then 
no  special  rules  need  apply.  We  could  use  tuple  set  reduction  on  each  nested 


relation,  but  if  we  assume  that  the  input  relation  is  minimal  then  the  new 
relation  and  its  new  nested  relations  will  all  be  minimal  as  well.  Problems  in 
the  standard  definition  of  nest  arise  when  nulls  are  values  of  the  partitioning 
attributes.  The  question  is  whether  we  equate  nulls  for  partitioning  purposes. 
At  first  glance,  equating  nulls  would  be  advantageous  in  that  we  could  have  a 
succinct  notation  for  grouping  all  values  for  which  we  do  not  have  a  fully  defined 
partition  value.  However,  doing  this  grouping  would  give  the  impression  that 
one  value  could  replace  the  null  for  all  members  of  the  group.  Since  this  is  not 
generally  true,  we  should  not  equate  no-information  and  unknown  nulls,  when 
partitioning  the  relation.  The  dots  not  exist  null  is  a  special  case  though.  Since 
there  is  no  value  which  can  replace  a  dne  null,  it  is  appropriate  to  nest  all 
tuples  which  have  that  property  together.  Thus,  our  definition  of  null-nest  is 
not  different  from  standard  nest  except  that  two  attribute  values  are  considered 
equal  iff  they  are  both  the  same  domain  value  or  they  are  both  dne  nulls. 

Example  7.5:  Consider  the  INF  relation  of  Figure  7-12a.  Suppose  that  we 
want  to  nest  all  courses  taught  by  each  teacher.  For  the  two  “Smith”  tuples  the 
standard  nest  applies  and  we  get  the  single  tuple  with  “Mathl”  and  “Math2” 
together  in  a  nested  relation.  The  same  applies  to  the  two  tuples  with  dne  nulls. 
These  two  tuples  indicate  that  “Math5”  and  “Math6”  are  courses  that  exist,  but 
there  are  no  teachers  teaching  them,  so  we  can  group  these  courses  together  as 
courses  for  which  there  is  no  teacher.  If  we  find  that  our  information  was  wrong 
and  “Math5”  does  have  a  teacher  then  we  would  be  forced  to  update  this  tuple 
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cour$e*=(covr$e)  v  , 


teacher 

course* 

coarse 

Smith 

Math  I 

Math2 

dne 

Math5 

Math6 

ni 

Sciencel 

ni 

Science2 

teacher 

course 

Smith 

Mathl 

Smith 

Math2 

dne 

Math5 

dne 

Math6 

ni 

Sciencel 

ni 

Science2 

W  (b) 


Figure  7-12.  Example  of  neat  with  null  values, 
just  as  if  we  found  out  the  “Smith”  is  not  really  teaching  “Math2”.  Finally, 
the  two  tuples  with  ni  nulls  are  nested  singly,  since  we  have  no  assurance  that 
they  will  be  in  the  same  partition  when  more  information  is  found  out  about 
them.  In  this  case,  the  two  courses  may  be  newly  added  ones,  for  which  we 
know  nothing  about  who  will  teach  them  or  even  if  they  will  be  taught.  Figure 
7-12b  shows  the  nested  relation.  □ 

Before  we  consider  the  preciseness  of  the  null-nest  operator,  we  in¬ 
troduce  a  modified  possibility  function  to  deal  with  PNF  relations.  Consider 
the  nested  relation  of  Example  7.5.  Using  our  current  definition  of  POSS,  one 
possibility  for  this  relation  is  constructed  by  replacing  the  ni  nulls  with  the 
same  value,  say  “Jones.”  As  a  result,  we  no  longer  have  a  PNF  relation.  An 
alternative  possibility,  representing  the  same  information,  is  constructed  by  re¬ 
placing  the  {  ni,  {{  Sciencel)})  and  (  ni,  {(  Science2)})  tuples  with  the  single 


tuple,  (  Jones,  {(  Sciencel ),  (  Science2  )}  ).  This  possibility  also  satisfies  the 
current  definition  of  POSS,  but  the  resulting  relation  is  in  PNF.  Therefore,  we 
will  use  a  modified  definition  of  POSS,  so  that  only  PNF  relations  are  allowed. 
The  set  of  PNF  possibilities  for  relation  r  on  scheme  R  is  denoted  POSS*(r), 
and  is  defined  as: 

POSS*(r)  =  {q  |  q  E  Rel*(R)  U  Rel(R)  and  q  >  r  and  q  is  in  PNF}. 

Proposition  7.9:  Null-nest  is  a  precise  generalization  of  standard  nest  with 
respect  to  PNF  possibility  function  POSS*. 

Proof:  Let  X  be  the  attributes  of  r  being  nested.  We  show  that 
POSS*(i/3_(X)(r))  =  i/B=(x)(POSS*(r)).  We  show  inclusion  both  ways.  Let 

P  =  *£-<*)  (r)' 

C  Let  p  E  POSS*(p).  There  are  two  cases  depending  on  the  assignment 
by  POS S*  to  null  values  in  the  partition  keys  of  p.  In  the  first  case,  if 
POSS*  assigned  the  same  value  to  nulls  in  otherwise  equal  partition 
keys  of  p,  then  these  tuples  will  be  combined  by  the  PNF  requirement 
of  POSS*.  By  making  this  same  assignment  of  nulls  directly  to  r, 
then  nesting  will  also  combine  these  tuples.  In  the  second  case,  if 
we  make  the  same  assignment  to  nulls  in  p  and  in  r,  then  nesting  on 
POSS*(r)  will  also  produce  p.  Thus,  p  E  uB=  (x]{POSS*{r)). 

D  Let  p  E  vb=(X){POSS* (r)).  There  must  be  f  E  POSS*(r)  such  that 


p  =  i^B=(x)(r).  Consider  the  assignment  of  values  made  by  POSS*  in 
r.  If  we,  in  POSS*[p),  make  the  same  assignment  to  the  correspond¬ 
ing  nulls  in  p,  then  we  get  also  p.  Thus,  p  E  POSS*(p). 

We  conclude  that  null-nest  is  a  precise  generalization  of  standard  nest  for 
POSS \  □ 

7 .2.2.2  Null-unntst 

If  nested  relations  are  inserted  into  our  database  solely  by  application  of  the 
nest  operator  to  relations  in  INF,  then  the  standard  definition  of  unnest  can 
apply  to  relations  with  nulls  and  there  are  no  problems.  However,  if  we  allow 
arbitrary  nested  relations  then  unnesting  can  produce  non-minimal  relations 
and  cause  loss  of  information. 

Example  7.6:  Recalling  the  database  scheme  of  the  previous  example,  con¬ 
sider  a  relation  r  with  two  tuples  1 1  =  (  Jones,  {(  Math  ),  (  Science  )}  )  and 
tj  =  (  ni,  {(  Math  ),  {  English  )}).  If  we  unnest  r,  then  the  resulting  (ni.Math) 
tuple  is  less  informative  than  the  (Jones, Math)  tuple.  Thus,  even  though  ti 
and  t2  form  a  minimal  relation,  their  unnested  counterparts  do  not.  □ 

The  problem  with  arbitrary  ->1NF  relations  is  they  allow  the  misuse 
of  ni  and  unk  nulls  in  the  partition  attributes.  Our  previous  discussion  of  the 
nest  operator  showed  that  when  an  ni  or  a  unk  null  is  in  one  of  the  partition 
attributes,  then  the  nested  relation  should  have  cardinality  of  one.  But,  one 


can  argue  that  we  may  know  that,  say,  two  tuples  are  both  related  to  one 
undetermined  value  and  we  should  take  advantage  of  that  fact  and  store  those 
two  tuples  in  the  same  nested  relation.  If  this  is  true,  then  an  answer  is  to 
use  marked  ni  and  unk  nulls  [Sci2].  Then  a  tuple  can  be  subsumed  only  if  its 
marked  nulls  do  not  exist  in  any  tuple  other  than  the  subsuming  tuple.  Using 
marked  nulls  also  avoids  some  loss  of  information.  In  the  previous  example,  if 
we  unnest  r  and  then  perform  the  reverse  nest  operation,  we  would  find  three 
tuples  in  the  result  as  the  tuples  with  ni  as  the  teacher  value  would  not  be 
nested  together  as  per  our  previous  arguments.  It  would  be  appropriate  to 
equate  identical  marked  nulls  and  so  a  nest  would  return  the  original  relation. 

Another  reason  for  our  treatment  of  n!  and  unk  is  so  that  null-unnest 
is  a  precise  generalizations  of  the  standard  operator.  In  Example  7.6,  every  re¬ 
lation  in  fieour*e*{POSS*(r))  must  contain  (i,  Math  )  and  (x,  English  )  for  some 
value  x.  However,  there  are  relations  in  P055*(p'cour„,(r))  which  do  not  have 
both  of  these  tuples  for  some  value  x.  So,  under  the  assumption  that  tuples 
with  ni  or  unk  nulls  in  the  partition  attributes  of  a  relation  (nested  or  oth¬ 
erwise)  have  only  single  tuple  nested  relations  for  each  higher  order  attribute, 
our  definition  of  null-unnest  is  unchanged  from  the  standard  unnest  definition. 
Furthermore,  we  can  prove  that  null-unnest  is  a  precise  generalization. 

Proposition  7.10:  Null-unnest  is  a  precise  generalization  of  standard  unnest 
with  respect  to  PNF  possibility  function  POSS *. 
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Proof:  We  show  that  POSS*(p'B[r))  =  pB(POSS*{r)).  Let  p  =  p'B(r). 

C  Let  p  G  POS S*(p).  If  we  make  the  same  assignment  to  the  nulls  in  p 
as  in  the  nested  relation  r  then  p  G  (j>b{POS S" (r)).  This  is  possible 
since  we  assume  that  tuples  in  r  with  null  values  in  the  partition  keys 
have  single  tuple  nested  relations.  Therefore,  there  is  a  one-to-one 
correspondence  between  these  null  values  in  both  r  and  p. 

5  Let  p  G  HB(POSS*(r)).  Then  there  must  be  f  G  POSS*(r)  such  that 
p  =  Pb  (?).  Let  tp  be  a  tuple  in  p.  Now,  tp  unnested  from  some  tuple 
t,  in  r,  which  has  some  PNF  possibility  such  that  t ?  >  tr.  Let 

=  PB{h).  Then,  we  have  >  tp.  We  conclude  that  p  >  p  and  so 
p€POSS*{p). 

We  conclude  that  null-unnest  is  a  precise  generalization  of  standard  unnest  for 
POSS\  □ 

With  this  result  we  can  now  show  that  the  null-unnest*  operator  (//*) 
is  a  precise  generalization  of  the  standard  unnest*  operator. 

Corrolary  7.1:  Null-unnest*  is  a  precise  generalization  of  standard  unnest* 
with  respect  to  PNF  possibility  function  POSS*. 


Proof:  Apply  the  same  argument  as  for  Proposition  7.10,  only  use  complete 
unnesting  instead  of  single  unnesting.  □ 


7 .2.8  Null-extended  Operators 


Let  Rel]*  represent  the  set  of  all  relations  which  are  not  in  INF  or  which 
contain  a  null  value.  Thus,  Rel*  URel T  =  Rel]*  and  Ref  n  Rel\*  =  0.  Our 
goal  is  to  generalize  the  ->1NF  operators  to  deal  with  null  values.  We  have  two 
choices  for  our  definition  of  a  precise  generalization  for  the  operators.  We  can 
either  apply  the  PNF  possibility  function  first  and  then  unnest  the  result  or  we 
can  unnest  first  and  then  apply  the  PNF  possibility  function,  resulting  in  the 
following  two  definitions. 

Definition  7.13:  Let  7  be  an  operator  on  Rel  and  let  7'*  be  an  operator  on 
Rel]*.  We  say  that  7'*  is  a  precise  generalization  of  7  relative  to  unnesting  and 
PNF  possibility  function  POSS *  if  one  of  the  following  two  conditions  holds: 

1.  when  7  and  7'*  are  unary  operators,  p* [POSS* (7" (r)))  = 

l{p *(POSS*(r)))  for  every  r  £  Rel]*  for  which  7 '*(r)  is  defined. 

2.  when  7  and  7'*  are  binary  operators,  p*(POSS*(r  7'*  q))  = 
p*(POSS*(r))  7  p*(POSS*{q))  for  every  r,q  £  Rel]*  for  which  r  7'*  q 
is  defined. 

Definition  7.14:  Let  7  be  an  operator  on  Rel  and  let  7'*  be  an  operator  on 
Rel]*.  We  say  that  7'*  is  a  precise  generalization  of  7  relative  to  unnesting  and 
PNF  possibility  function  POSS*  if  one  of  the  following  two  conditions  holds: 

1.  when  7  and  7'*  are  unary  operators,  POSS*(p,*(/y'*(r)))  = 


7 (POS5'*(/i'*(r)))  for  every  r  £  Rd\*  for  which  Y*(r )  is  defined. 


2.  when  7  and  7'*  are  binary  operators,  POSS,[fi.',(r  7"  g))  = 
P055*(Y*(»"))  7  POSS*(n'*(q))  for  every  r,g  €  Re/T*  for  which 
r  7'*  g  is  defined. 

Theorem  7.1:  Definition  7.13  and  Definition  7.14  are  equivalent. 

Proof:  By  Corrolary  7.1,  we  know  that  null-unnest*  is  a  precise  generalization 
of  standard  unnest*  for  POSS *.  Thus,  the  definitions  are  equivalent.  □ 

There  are  corresponding  definitions  of  adequate  and  restricted  for 
.Re/T*,  and  there  are  three  specifications  of  faithfulness  we  could  use:  com¬ 
paring  relations  in  Rel]*  to  relations  in  Rel,  Rel\ ,  and  Rel* .  As  in  the  previous 
section,  proofs  of  faithfulness  are  straightforward  and  so  we  shall  omit  them 
here. 

7 .2.3.1  Null-extended  union 

Our  definition  of  null-extended  union  can  be  revised  to  accommodate  nulls  by 
adding  tuple  set  reduction  as  follows. 

Definition  7.15:  In  order  to  take  the  null-extended  union  of  two  relations  ri 
and  r2  we  require  that  they  have  equal  relation  schemes,  say  R.  The  scheme  of 
the  resultant  structure  is  also  R.  We  define  null-extended  union  at  the  instance 
level  a s  follows.  Let  X  range  over  the  zero  order  attributes  in  Er  and  Y  range 


over  the  higher  order  attributes  in  Er.  The  null-extended  union  of  ri  and  r2  is: 

n  u  *'  r2  =  {t  I  (3  tx  er!  A  3*2  €  r2 :  (VX,Y  e  ER  :  t[X\  =  *x[X]  =  *2[X] 

a  t[Y]  =  (*i[y]  ue'  *2[y]))) 

v  (ten  a  (VS  e  n  :  (vx  e  ER  :  t[x\  ^  *'[x]))) 

V(terjA  (V*'  e  n  :  (VX  e  ER  :  t[X]  ^  *'[X])))} 

Note,  this  definition  is  recursive  in  that  we  apply  the  null-extended  union  to 

each  higher  order  attribute  Y. 

As  for  extended  union  (section  5.2.1),  we  require  the  use  of  the  A- 
union  operator  which  maintains  the  join  dependency  involving  the  path  set  of 
-i INF  relation’s  scheme  tree. 

Proposition  7.11:  Null-extended  union  is  a  precise  generalization  of  A-union 
with  respect  to  unnesting  and  PNF  possibility  function  POSS *,  where  the  join 
dependency  used  in  the  A-union  is  the  path  set  of  the  -ilNF  relation’s  scheme 
tree. 

Proof:  We  show  that  fji*(POSS*(r  U*'  g))  =  n*(POSS*(r))  UA 

fi*(POSS*(q)).  By  Proposition  5.3,  we  know  that  extended  union  is  a  pre¬ 
cise  generalization  of  A-union,  and  so  n*(POSS*(r))  UA  fi*(POSS*(q))  = 
jLt*(P055*(r)  U*  POSS*(q)).  Thus,  we  only  need  to  show  that  POSS*(r  U*' 
q)  —  POSS*(r)  U*  POSS*(q).  We  show  inclusion  both  ways.  Let  p  =  r  Ue'  q. 

D  Let  p  e  POSS*(r)  Ue  POSS*(q).  There  must  be  r  e  POSS*(r)  and 
q  e  POSS*(q )  such  that  p  =  r  Ue  q.  Let  tp  be  a  tuple  in  p.  Either 


tp  6  r,  tp  €  q,  or  tp  is  a  combination  of  tuples  in  r  and  q  with  equal 
partition  keys.  If  tp  G  r,  there  is  a  tuple  6  r  such  that  t-  >  tp. 
Now,  tp  is  either  in  p  or  is  included  in  a  combined  tuple  of  p,  since  the 
null  values  of  some  partition  key  may  have  been  assigned  values  that 
make  the  partition  key  non-unique.  In  any  case,  this  tuple  subsumes 
tp.  A  similar  argument  can  be  made  if  tp  G  q.  If  tp  is  a  combination 
of  tuples  in  t  and  q,  then  there  are  no  null  values  in  the  outer  most 
partition  key.  Therefore,  in  p,  these  tuples  will  also  combine,  and 
there  is  a  possibility  which  subsumes  tp.  We  conclude  p  >  p,  and  so 
pe  POSS*{p).  Therefore,  POSS*{p)  D  POSS*{r)  Ue  POSS*{q). 

C  Let  p  €  POSS*(p).  Since  p  >  r,  p  >  r  and  p  is  in  PNF.  Therefore, 
p  G  POSS*  (r).  Similarly,  p  <=  POSS*[q).  Then,  p  6  POSS*{r)  Ue 
POSS*(q),  and  so  POSS*{p)  C  POSS*{r)  Ue  POSS*(q). 

We  conclude  that  null-extended  union  is  a  precise  generalization  of  A-union  for 
POSS*  with  respect  to  unnesting.  □ 

7.2.S.2  Null-extended  difference 

We  change  the  definition  of  extended  difference  to  include  null  values  by  keeping 
tuples  in  a  relation  only  if  they  are  not  subsumed  by  some  tuple  in  the  other 
relation. 


Definition  7.16:  Let  r i  and  r2  be  relations  on  scheme  R.  Let  X  range  over  the 


zero  order  attributes  in  Er  and  Y  and  Z  range  over  the  higher  order  attributes 
in  Er.  The  null-extended  difference  of  rx  and  r2  is: 
fi  — el  r2  =  {i  |  (3fx  £  ri  A  3f2  £  f j  A  3 Z  G  Er  : 

(VX,Y  €  Er  :  t\X\  =  h[x\  =  t2[X]  A  t[Y]  =  (tx[Y]  -e'  *2[Y]))) 
V  (t  £  rx  A  (Vt1  G  r2  :  ^(f'  >  *)))} 

Proposition  7.12:  Null-extended  difference  is  an  adequate  and  restricted  gen¬ 
eralization  of  A-difference  with  respect  to  unnesting  and  possibility  function 
POSS*,  where  the  join  dependency  used  in  the  A-difference  is  the  path  set  of 
the  ->1NF  relation’s  scheme  tree. 

Proof:  We  show  adequate  and  then  restricted. 

adequate:  fi*(POSS*(r  q))  D  fi*(POSS*(r))  -  n'(POSS'(q)). 

By  Proposition  5.8,  we  know  that  extended  difference  is  a  precise  gen¬ 
eralization  of  A-difference,  and  so  fi*(POSS*(r))  — A  p*(POSS*(q))  = 
H*{POSS*{r)  — *  POSS*(q)).  Thus,  we  need  only  show  that  POSS*(r  — 
q)  D  POSS*{r)  -*  POSS*{q).  Let  p  =  r  -e'  q,  and  p  G  POSS*(r)  -* 
POSS*(q).  Then,  there  exists  f  G  POSS*(r)  and  q  G  POSS*(q),  such 
that  p  =  r  — *  §.  Let  tp  be  a  tuple  in  p.  Then,  tp  must  be  in  r  with, 
perhaps,  some  of  its  nested  relations  reduced  by  interaction  with  a  tuple 
t9  in  q.  Therefore,  there  must  be  tuples  G  f  and  G  q  which  will  also 
interact  in  the  same  way,  noting  that  interaction  occurs  only  when  the  zero 
order  attributes  have  non-null  values.  Thus  there  is  a  tuple 
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in  jp,  such  that  >  tp.  We  conclude  that  p  >  p  and  so  p  G  POSS*(p). 
Therefore,  POSS*(p)  D  POSS*{r)  -•  POSS*{q). 

restricted:  there  does  not  exist  p  such  that  p*(POSS*(r  — g))  ^ 

p*(P055*(p))  D  p* (POSS*(r))  -A  p*(POSS*(q)). 

As  in  the  case  for  adequate,  we  need  only  show  that  there  does  not  exist 
p  such  that  POSS*(r  -«  q)  $  POSS*(p)  D  POSS^r)  -e  POSS*  (</)). 
Suppose  there  is  some  p.  If  POSS*[r  9)  2  POSS*{p),  then  there 
must  be  some  tuple  t  in  p  that  does  not  subsume  any  tuple  in  r  —  q.  This 
means  that  the  non-null  valued  zero  order  attributes  X  of  t,  or  some  nested 
relation  in  t,  do  not  match  any  tuple  on  X  in  the  corresponding  place  in 
r  — q.  Let  z  be  the  relation  (either  r  or  a  nested  relation  in  r)  and  t'  the 
tuple  in  z  where  the  matching  does  not  occur,  and  w  be  the  corresponding, 
possibly  empty,  relation  in  q.  There  are  two  possible  reasons  for  there  not 
being  a  match:  either  t'[X]  €  z  and  3s  G  w  :  s  >  V ,  or  t'[X]  g  z[X],  In 
each  case,  the  corresponding  relation  in  POSS*(p)  must  contain  a  tuple 
which  subsumes  t',  however,  POSS*(r)  —  POSS*(q)  contains  a  relation  in 
which  the  corresponding  relation  does  not.  In  the  first  case,  the  possibility 
of  t'  can  be  eliminated  by  the  possibility  of  s  in  w  that  subsumes  it,  and  in 
the  second  case,  simply  choose  not  to  include  t'  in  POSS*{r).  Therefore, 
POSS*(p)  2  POSS*(r)  —  POSS*{q),  which  is  a  contradiction. 

We  conclude  that  null-extended  difference  is  an  adequate  and  restricted  gener- 


alization  of  A-difference  for  POSS*  with  respect  to  unnesting.  □ 

7.2.S.S  Intersection,  Cartesian  Product,  and  Select 

We  will  not  formally  define  these  “null-extended*  versions  of  these  operators. 
A  null-extended  intersection  can  be  obtained  from  union  and  difference  by 

ri  n*'  r2  =  (n  Ue'  r2)  ((r»  -*  r2)  u*  (r2  n)). 

We  note  also  that  null-extended  intersection  is  an  adequate  and  restricted  gen¬ 
eralization  of  standard  intersection  with  respect  to  unnesting  and  PNF  possi¬ 
bility  function  POSS*.  For  select  we  will  use  null-select  as  defined  in  section 
7.1,  and  the  standard  cartesian  product  operator. 

7 .2.3.4  Join 

The  problems  involved  in  defining  join  operations  for  relations  with  nulls  and  for 
-ilNF  relations  have  been  discussed  before.  Combining  nulls  and  ->1NF  does 
not  improve  the  situation.  However,  our  limited  operator,  extended  natural 
join,  does  have  an  adequate  and  restricted  generalization  with  respect  to  PNF 
possibility  function  POSS*. 

Definition  7.17:  Let  X  be  the  higher  order  attributes  in  Erx  D  Er2,  A  = 
Er ,  —  X,  and  B  =  Er2  —  X.  Then  the  null-extended  natural  join  is  r\  tx*'  r2 
which  produces  a  relation  r  on  scheme  R  where: 
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2.  r  =  {t  |  (3u  £  ri,v  £  ra  :  t[A ]  =  u[A]  A  i[fl]  =  w[B]  A  t[X]  =  (u[X]  fl*# 
v[X})  A  t[X]  *  0} 


Note  we  use  null-extended  intersection  to  combine  the  nested  relations,  and 
that  zero  order  attributes  can  only  have  equal  values  if  neither  is  ni  or  unk. 
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Proposition  7.13:  Null-extended  natural  join  is  an  adequate  and  restricted 
generalization  of  standard  natural  join  with  respect  to  unnesting  and  PNF 
possibility  function  POSS*. 

Proof:  By  Proposition  5.10,  we  know  that  extended  natural  join  is  a  precise 
generalization  for  standard  natural  join.  Therefore,  we  need  only  show  that 


null-extended  natural  join  is  an  adequate  and  restricted  generalization  of  ex¬ 
tended  natural  join.  We  show  adequate  and  then  restricted. 

adequate :  POSS*(r  m"  q)  D  POSS*{r)  ex'  POSS*{q). 

Let  p  =  r  cxe<  q  and  £  £  POSS*(r)  txe  POSS*(q).  Also,  let  C  be  the  com¬ 
mon  zero  order  attributes  of  r  and  q.  Then,  there  must  be  r  £  POSS*(r) 
and  q  £  POSS*(q)  such  that  p  =  r  ex'  q.  Let  tp  be  a  tuple  in  p.  Then, 
there  are  tuples  tT  £  r  and  f,  £  q  such  that  tp[C]  =  tT\C\  =  tq[C].  There 
are  also  tuples  t?  £  f  and  ty  £  q  that  agree  on  C  with  tp,  and  will  partici¬ 
pate  in  the  join  giving  Jy.  Now,  the  common  higher  order  attributes  X  of 
t?  and  ty  will  participate  in  an  extended  intersection,  the  result  of  which 
will  subsume  the  result  of  the  null-extended  intersection  of  tr\X\  and  t9[X\. 
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Therefore,  t~  >tp,p>  p,  and  so  p  G  POSS*(p). 

restricted :  there  does  not  exist  p  such  that  POSS*(r  tx*'  q)  2  POSS*(p)  D 
POSS*{r)  xf  POSS*(q). 

Suppose  there  is  some  p.  If  POSS*(r  txe'  9)  2  POSS*(p),  then  there 
must  be  some  tuple  t  in  p  that  does  not  subsume  any  tuple  in  r  ex*'  q. 
Thus,  t  contains  non-null  values  which  must  occur  in  any  possibility  of  p, 
but  not  in  all  possibilities  of  r  xf'  q.  Consider  the  possibilities  for  tuples 
in  r  and  q  which  could  exist  to  join  to  make  a  possibility  for  t.  Since  t 
does  not  subsume  any  tuple  in  r  ex*'  q,  it  must  either  have  projections  on 
the  common  zero  order  attributes  that  are  null  or  different  actual  values, 
or  have  different  actual  values  in  a  common  nested  relation.  In  the  first 
case,  there  is  a  possibility  for  tuples  in  r  and  q  which  set  the  null  value  to 
different  actual  values,  and  so  they  do  not  participate  in  the  join.  In  the 
second  and  third  case,  every  possibility  of  p  must  contain  those  different 
values,  yet  there  are  possibilities  of  r  and  q  which  do  not.  Therefore,  there 
is  a  possibility  of  r  and  q  whose  extended  join  is  not  a  possibility  of  p.  So, 
POSS*(p)  2  POSS*(r)  ex'  POSS*(q)t  which  is  a  contradiction. 

We  conclude  that  null-extended  natural  join  is  an  adequate  and  restricted  gener¬ 
alization  of  standard  natural  join  with  respect  to  unnesting  and  PNF  possibility 


function  POSS*. 
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7.2. S. 5  Null-extended  Projection 

We  define  null-extended  projection  as  an  extended  projection  followed  by  tuple 
set  reduction,  or  as  a  tuple-wise  null-extended  union  of  the  usual  projection. 

Definition  7.18:  The  null-extended  projection  of  relation  r  on  attributes  X  is 

'x{r)  =  I  *x(r)}  =  LI*1  (0 

t€r[X] 

Proposition  7.14:  Null-extended  projection  is  a  precise  generalization  of 
standard  projection  with  respect  to  unnesting  and  PNF  possibility  function 
POSS*. 

Proof:  Since  the  only  difference  between  null-extended  projection  and  ex¬ 
tended  projection  is  removal  of  subsumed  tuples,  the  proof  mirrors  the  proof 
for  null-extended  union  (Proposition  7.2).  □ 

7.3  Dependencies  in  a  Database  with  Null  Values 

A  key  assumption  made  in  this  chapter  has  been  the  requirement  of  partitioned 
normal  form.  In  the  definition  of  PNF,  we  assume  that  certain  multivalued 
dependencies  must  hold  in  a  INF  relation  before  it  can  be  legally  nested  into 
a  particular  form.  Furthermore,  multivalued  dependencies  imply  functional 
dependencies  in  the  nested  relation.  Therefore,  it  is  important  to  determine 
what  effect  the  addition  of  null  values  will  have  on  these  dependencies. 


In  this  section  we  will  discuss  the  previous  work  on  extending  depen- 
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Figure  7-13.  Relation  satisfying  A—*—*B  and  B— >—►(?,  but  not  A—*— >C 
when  dne  nulls  are  not  equated. 

dencies  to  deal  with  nulls,  providing  some  new  clarifying  information.  We  will 
examine  how  these  dependencies  interact  with  the  non-existent,  unknown,  and 
no-information  interpretation  of  nulls. 


7.3.1  Non-existent  Nulls 


In  [Lie3],  a  sound  and  complete  axiomatization  for  functional  and  multivalued 
dependencies  are  given  for  a  relational  model  in  which  dne  nulls  are  allowed.  In 
this  model,  dne  nulls  are  not  considered  equal  to  each  other.  Notably  missing 
from  the  inference  rules  for  both  FDs  and  MVDs  is  the  transitivity  rule.  The 
problem  occurs  when  dne  nulls  appear  in  the  attribute  that  implements  the 
transitivity,  as  the  application  of  the  FD  and  MVD  rules  is  denied  when  null 
values  are  present  on  the  left  hand  side  of  the  rule. 

An  example  for  MVDs  is  a  relation  r  on  scheme  R  =  (A,B,C,D) 


where  A-*-*B  and  B— ♦— hold,  but  A—>—>C  does  not  hold  (Figure  7-13). 


/  m\ 


We  assume  a  model  of  a  relation  in  which  tuples  or  fragments  of  tuples 
represent  fundamental  relationships  in  the  world  being  modeled.  Each  set  of 
attributes  that  is  involved  in  one  of  these  fundamental  relationships  is  called 
an  object  [FMU].  On  examining  the  first  two  tuples  in  relation  r,  it  must  be 
true  that  there  is  an  object  involving  attributes  A,  C,  and  D,  and  no  subset 
of  them.  Otherwise,  we  would  have  to  add  two  tuples  matching  the  first  two 
tuples  in  r  but  with  the  C  and  D  values  swapped.  However,  on  examining  the 
last  four  tuples,  where  dne  nulls  do  not  occur,  there  are  independent  AC  and 
AD  objects.  If  we  accept  this,  then  we  must  accept  the  fact  that  there  are 
two  different  semantics  for  tuples  in  r.  If  the  value  of  B  is  dne  then  an  ACD 
association  must  exist,  and  if  the  value  is  not  dne  then  independent  AC  and 
AD  associations  must  exist,  in  addition  to  associations  involving  B.  We  do  not 
believe  this  is  a  plausible  way  to  interpret  a  relation. 

The  solution  is  to  equate  dne  nulls  from  the  same  domain.  Then,  in 
a  database  with  only  dne  nulls  added,  the  definitions  of  FD  and  MVD  remain 
identical  to  the  standard  ones  and  the  same  axiomatization  is  valid.  This  is 
intuitively  pleasing  as  well,  since  a  dne  null  cannot  be  replaced  by  another 
value.  In  fact,  it  indicates  that  we  know  that  no  other  domain  value  is  valid. 


Non-existent  nulls  also  require  a  more  complicated  test  when  tuples 
are  inserted  into  a  relation.  In  addition  to  the  usual  tests  to  see  that  given 
dependencies  are  not  violated,  we  must  ensure  the  exclusivity  of  the  dne  null 
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in  each  object  in  which  it  appears.  For  example,  let  us  attempt  to  add  the 
tuple  (os,  b,  dne,  ds)  to  relation  r  above.  This  insertion  should  be  denied  since 
it  is  inconsistent  that  b  is  related  to  ci  and  c 2  and  also  that  6  is  related  to  no 
C  value.  This  new  integrity  constraint  is  embodied  in  the  following  rule. 

Exclusivity  Rule  for  dne  Nulls:  Let  r  be  a  relation  with 
objects  0.  For  each  O  6  0,  in  ffo(r)  there  do  not  exist 
two  tuples  fi  and  ij  where  h[A]  =  dne,  h[A\  ±  t<i[A},  and 
tt[0  —  A]  =  tj[0  —  A],  for  any  A  €  O. 

7.8.2  Unknown  Nulls 

The  effect  of  unk  nulls  on  functional  dependencies  has  been  adequately  covered 
in  [Vasl],  The  definition  of  an  FD  must  be  modified  so  that  unk  nulls  are  not 
equivalent.  This  must  be  the  case  since  we  have  no  way  of  knowing  whether 
two  unk  nulls  will  turn  out  to  be  the  same  or  different  values.  The  same  logic 
holds  for  MVDs.  However,  unlike  the  assumptions  made  by  [Lie2,  Lie3]  for  dne 
nulls,  even  though  we  cannot  apply  an  FD  to  adjust  values  or  an  MVD  to  add 
tuples  when  the. '  are  unk  nulls  on  the  left  hand  side  of  the  dependency,  we  still 
have  the  usual  axiomatization  for  FDs  and  MVDs.  In  proof,  suppose  we  have  a 
relation  that  satisfies  some  given  dependencies,  but  not  some  dependency  which 
follows  from  the  usual  axiomatization.  An  example  is  relation  r  in  Figure  7-13, 
with  unk  nulls  replacing  the  dne  nulls.  Since  unk  nulls  are  placeholders  for 
actual  facts  about  the  world,  the  dependencies  with  which  we  have  constrained 


the  world  are  not  altered  by  the  presence  of  these  nulls.  Therefore,  dependencies 
which  follow  from  the  given  dependencies  in  a  world  without  null  values  must 
still  hold  in  a  world  with  nulls.  Thus,  a  relation  such  as  r  with  unk  nulls, 
must  not  be  a  complete  or  accurate  representation  of  the  world,  since  for  any 
relation  r,  every  relation  in  POSS(r)  must  satisfy  all  FDs  and  MVDs  which 
can  be  derived  from  the  given  dependencies. 

7.8.8  No-information  Nulls 

The  only  published  work  dealing  with  dependencies  and  the  no-information 
interpretation  of  nulls  is  an  axiomatization  of  FDs  by  [AM].  As  in  previous 
approaches,  they  redefine  the  FD  so  that  it  is  applicable  only  when  non-null 
values  are  present.  Therefore,  they  conclude  the  same  results  as  [Lie3],  about 
the  lack  of  transitivity  in  this  model.  Based  on  the  lattice  developed  in  section 
7.1,  we  know  that  an  ni  null  will  eventually  be  replaced  by  either  an  unk  null 
or  a  dne  null  when  we  find  out  whether  or  not  a  value  actually  exists.  Hence, 
given  a  relation  r  with  ni  nulls,  in  any  relation  in  POSS(r)  all  ni  nulls  will  be 
replaced  by  actual  values  or  by  dne.  As  discussed  earlier  in  this  section,  in  these 
cases,  there  is  no  valid  reason  not  to  retain  the  same  axiomatization  for  FDs 
and  MVDs  as  for  relations  without  nulls,  and  to  do  so  would  possibly  eliminate 
important  dependencies  for  use  in  database  design  and  normalization.  Thus, 
we  repeat  an  earlier  statement,  that  the  definitions  of  FD  and  MVD  need  not 
be  changed  as  long  as  the  convention  that  two  values  from  the  same  extended 
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domain  axe  equal  if  they  are  the  same  value  and  neither  one  is  ni  or  unk. 

7.8.4  Join  Dependency 

At  first  glance,  there  doesn’t  seem  to  be  any  good  way  to  define  the  join  depen¬ 
dency  on  relations  with  nulls.  Consider  the  tuple  {a,  ni,  c)  defined  on  scheme 
R  =  ( A,B,C ).  Normally  any  one  tuple  relation  satisfies  any  join  dependency 
since  any  projections  of  the  tuple  will  obviously  join  to  form  the  original  tuple. 
However,  with  the  given  tuple,  the  join  dependency  *(AB,BC)  does  not  hold 
since  the  projections  will  not  join  on  ni.  However,  the  MVD  which  follows 
from  this  join  dependency,  B—>—>A,  does  hold  by  default.  What  we  need  is  a 
“default"  for  the  join  dependency  when  ni  or  unk  nulls  are  present  in  the  join 
attributes.  We  have  decided  that,  in  general,  ni  and  unk  nulls  should  not  be 
equated  with  each  other.  However,  each  null  does  stand  for  one  and  only  one 
value  (actual  or  dne),  and  so  if  a  null  is  transported  to  more  than  one  place 
we  should  identify  them  to  be  the  same.  Therefore,  we  mark  ni  and  unk  nulls 
before  applying  the  test  for  satisfying  the  join  dependency,  doing  so  by  equat¬ 
ing  identically  marked  nulls.  We  now  have  an  appropriate  definition  for  a  join 
dependency  in  our  framework  and  we  can  use  the  existing  theory  for  deriving 
MVDs  from  valid  join  dependencies. 


Chapter  8 

The  SQL/NF  Query  Language 


In  Chapter  4,  we  defined  a  formal  predicate-calculus-based  language  for  dealing 
with  ->1NF  relations.  That  language  defines  a  minimal  degree  of  power  that 
we  expect  from  any  language  designed  to  operate  on  nested  relations.  For  real- 
world  users  of  a  database  system,  however,  a  terse  predicate-calculus  language 
is  too  difficult  to  use.  These  considerations  have  lead  to  the  definition  of  sev¬ 
eral  “syntactically-sugared”  query  languages  such  as  SQL,  Query-By-Example, 
and  QUEL.  In  this  chapter,  we  extend  one  of  the  most  widely-used  of  these 
languages,  SQL  [C+],  to  operate  on  a  database  of  nested  relations.  Most  of  our 
extensions  pertain  to  the  data  manipulation  part  of  SQL,  although  we  extend 
also  the  SQL  data  definition  language  to  permit  the  definition  of  ->1NF  data¬ 
bases.  In  defining  SQL/NF,  an  important  goal  was  to  retain  the  “spirit”  of  the 
existing  SQL  language  so  as  to  reduce  the  effort  required  on  the  part  of  existing 
SQL  users  to  learn  SQL/NF.  Roughly  speaking,  wherever  a  constant  or  scalar- 
valued  variable  may  appear  in  SQL,  a  relation  or  expression  evaluating  to  a 
relation  may  appear  in  SQL/NF.  We  introduce  new  commands  to  transform  a 
relation  to  an  equivalent  nested  one  (the  nest  operation)  and  to  transform  a 
nested  relation  into  a  less-nested  one  (the  unnest  operation).  We  shall  assume 
that  the  reader  is  familiar  with  standard  SQL  as  described  in  [C+]  or  in  most 
standard  database  texts. 
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Although  we  retain  most  constructs  of  the  SQL  langauge,  we  would 
be  remiss  if  we  did  not  make  some  obvious  improvements  in  the  SQL  langauge. 
Some  of  these  changes  are  due  to  the  availability  of  nested  relations.  For  in¬ 
stance,  difficult  SQL  queries  involving  GROUP  BY  and  HAVING  can  be  eliminated 
in  favor  of  rather  straightforward  queries  on  properly  structured  ->1NF  rela¬ 
tions.  Other  changes  are  simply  to  correct  some  mistakes  made  in  the  design 
of  SQL.  In  Date’s  critique  of  the  SQL  language  [Datl],  a  good  case  is  made 
for  requiring  certain  modifications  to  the  SQL  language.  One  of  the  driving 
forces  behind  the  critique  is  the  language  design  maxim,  the  principle  of  or¬ 
thogonality.  This  principle  requires  separate  treatment  for  distinct  concepts, 
and  similar  treatment  for  similar  concepts  [Dat2].  The  following  definition  of 
the  SQL/NF  language  takes  into  account  this  principle,  directly  incorporating 
some  of  the  modifications  proposed  in  [Datl].  We  also  follow  where  possible  the 
proposed  standard  relational  database  language  [X3H2],  which  already  incor¬ 
porates  some  the  changes  suggested  here,  although  relying  on  a  INF  database 
model. 

The  remainder  of  this  chapter  is  organized  as  follows.  Sections  8. 1-8.3 
contain  our  definition  and  description  of  the  SQL/NF  language.  We  define  the 
query  facilities,  the  data  manipulation  language  and  the  data  definition  lan¬ 
guage.  Host  language  support  is  beyond  the  scope  of  this  dissertation.  Section 
8.4  compares  our  language  with  some  previous  attempts  at  defining  a  high- 
level  query  language  for  -TNF  models.  We  also  provide  a  BNF  definition  of 
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our  language  in  Appendix  B. 

8.1  Query  Facilities 

In  SQL,  a  basic  query  conforms  to  the  structure 

SELECT  attribute-list 
FROM  relation-list 
WHERE  predicate 

This  SFW-expression  can  be  conceptually  executed  by  forming  the  cartesian 
product  of  all  relations  in  the  relation-list,  choosing  only  tuples  in  this  prod¬ 
uct  that  satisfy  the  predicate,  and  then  choosing  only  those  attributes  in  the 
attribute-list.  If  no  qualification  of  tuples  is  needed  then  the  WHERE  clause  can 
be  omitted.  If  all  attributes  of  the  relations  in  the  from-list  are  desired  then 
SQL  allows  the  use  of  instead  of  actually  specifying  all  attributes  in  the 
attribute-list.  In  the  proposed  standard  relational  database  language  (hereafter 
called  RDL),  the  keyword  ALL  is  used  instead  of  [X3H2]. 

The  obvious  way  to  access  the  entire  contents  of  a  relation  would  be 

to  simply  state  the  relation  name.  However,  in  SQL,  we  have  to  use 

SELECT  * 

FROM  relation-name 

In  SQL/NF,  we  allow  free  substitution  of  relation-name  for  “SELECT  *  FROM 
relation-name1'  and  adopt  the  RDL  change  substituting  ALL  for  .  Consider 
the  INF  database  in  Figure  8-1.  The  department  (Dept)  relation  has  three 
attributes,  department  number  ( dno ),  department  name  ( dname ),  and  location 


Emp 


Dept 


dno 

dname 

loc 

10 

Manufacturing 

Austin 

20 

Personnel 

Dallas 

30 

Retail 

Austin 

J 

; 

l 

eno 

ename 

dno 

sal 

13 

Smith 

10 

20000 

33 

Jones 

30 

14000 

34 

Adams 

10 

15000 

48 

Miller 

10 

40000 

i 

; 

j 

: 

Figure  8-1.  A  sample  INF  database. 

(loc).  The  employee  (Emp)  relation  has  four  attributes,  employee  number  (eno), 
employee  name  ( tnamt ),  department  number  of  department  in  which  employee 
works  ( dno ),  and  salary  (so/).  The  query  to  get  employee  data  for  employees 
in  department  10  is 

SELECT  ALL 
FROM  Emp 
WHERE  dno  =  10 


or  rising  our  simplified  notation, 

Emp  WHERE  dno  =  10 


To  get  departments  which  have  at  least  one  employee,  the  query  is 
SELECT  ALL 
FROM  Dept 

WHERE  EXISTS  (SELECT  ALL 
FROM  Emp 

WHERE  Dept.dno  =  Emp. dno) 

or,  in  SQL/NF, 

Dept  WHERE  EXISTS  (Emp  WHERE  Dept.dno  =  Emp.dno) 


This  last  query  easily  paraphrases  as:  Get  department  tuples  where  there  exists 


employee  tuples  where  the  department  numbers  are  the  same.  The  same  can 
not  be  said  about  the  strict  SQL  version.  In  fact  it  is  not  clear  why  we  are 
selecting  any  attributes  at  all  in  the  EXISTS  subquery,  since  our  goal  is  not  to 
actually  extract  any  information  from  the  employee  relation  but  rather  to  test 
for  its  existence. 

8.1.1  Nested  Expressions 

A  language  should  provide,  for  each  class  of  object  it  sup¬ 
ports,  a  general,  recursively  defined  syntax  for  expressions 
that  exploits  to  the  full  any  closure  properties  the  object 
class  may  possess  [Datl;  12]. 

The  primary  objects  a  relational  database  language  supports  are  scalar 
(atomic)  values  and  relations.  In  INF  databases,  each  relation  is  comprised 
strictly  of  scalar  values.  In  -ilNF  databases,  each  relation  may  be  comprised  of 
other  relations  as  well  as  scalar  values.  The  principle  of  orthogonality  has  been 
usefully  employed  in  defining  the  ->1NF  data  structure.  Wherever  a  scalar 
value  could  occur  in  a  INF  relation,  a  relation  can  now  occur.  This  simple 
transformation  is  also  employed  in  our  definition  of  the  data  sublanguage.  SQL 
has  the  closure  property  where  the  result  of  any  query  on  one  or  more  relations 
is  itself  a  relation.  The  principle  of  orthogonality  suggests  that  we  should  allow 
an  SFW-expression  wherever  a  relation  name  could  exist.  In  SQL  this  means 
allowing  SFW-expressions  in  the  FROM  clause. 


The  first  use  of  such  a  modification  is  the  building  of  incremental 


queries.  Using  the  database  of  Figure  8-1,  consider  the  query:  Get  names  of 
employees  who  work  in  the  shipping  department.  The  first  step  a  user  may 
recognize  is  the  the  need  to  join  the  Emp  and  Dept  relations  on  dno.  So  he 
forms  the  query 

SELECT  ALL 

FROM  Emp,  Dept 

WHERE  Emp. dno  =  Dept.dno 

Then  from  this  relation  get  names  of  employees  in  the  shipping  department 
producing 

SELECT  ename 
FROM  (SELECT  ALL 

FROM  Emp,  Dept 

WHERE  Emp.dno  =  Dept.dno) 

WHERE  dname  =  "Shipping" 

An  SQL/NF-level  query  optimizer  could  then  translate  this  query  into  the 
equivalent  query 

SELECT  ename 
FROM  Emp,  Dept 
WHERE  Emp.dno  =  Dept.dno 
AND  dname  =  "Shipping" 

A  more  useful  example  involving  nested  expressions  in  the  FROM  clause 
involves  the  UNION  operator.  UNION  is  an  infix  operator  in  SQL  and  is  used  in 
the  form 

SFW -expression  UNION  SFW-expression 

Note  that,  in  SQL,  one  must  use  SFW-expression’s  with  UNION  and  not  rela¬ 
tions.  To  illustrate,  suppose  we  have  two  employee  relations  (in  the  form  of 
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Figure  8-1).  The  Emp-txte  relation  contains  executive  level  employees  and  the 
Emp-other  relation  contains  all  other  employees.  If  we  want  to  get  all  employ¬ 
ees,  we  write  the  SQL  query 

SELECT  * 

FROM  Emp-exec 

UNION 

SELECT  * 

FROM  Emp-other 


In  SQL/NF,  we  can  write  the  simpler  query 


Emp-exec  UNION  Emp-other 


Now,  if  we  modify  our  query  so  that  we  get  all  employees  who  make  more  than 
$35,000,  then  we  must  add  a  WHERE  clause  to  each  SFW-expression  in  the  SQL 
query. 

SELECT  * 

FROM  Emp-exec 
WHERE  sal  >  35000 
UNION 
SELECT  * 

FROM  Emp-other 
WHERE  sal  >  35000 


Using  SQL/NF,  we  can  form  the  union  first,  place  that  expression  in  the  FROM 

clause,  and  qualify  all  tuples  with  one  WHERE  clause. 

SELECT  ALL 

FROM  (Emp-exec  UNION  Emp-other) 

WHERE  sal  >  35000 


Nested  expressions  are  even  more  applicable  when  using  a  ->1NF  data¬ 
base.  Since  attributes  may  now  be  relation  valued,  relation  names  may  occur 
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Company 


dno 

dname 

loc 

Emps 

eno 

ename 

sal 

10 

Manufacturing 

Austin 

13 

Smith 

20000 

34 

Adams 

35000 

48 

Miller 

40000 

20 

Personnel 

Dallas 

30 

Retail 

Austin 

33 

Jones 

14000 

l 

• 

• 

1 

Figure  8-2.  A  sample  ->1NF  database. 

in  the  SELECT  clause  of  a  query,  and  so,  under  the  principle  of  orthogonality, 
we  allow  SFW-expressions  in  the  SELECT  clause.  For  the  following  examples 
we  will  use  the  -'INF  database  in  Figure  8-2,  in  which  we  have  combined  the 
data  of  the  Dept  and  Emp  relations  used  in  Figure  8-1.  The  company  ( Corn- 
pan y)  relation  has  four  attributes,  department  number  ( dno ),  department  name 
(dname),  location  ( loe ),  and  employees  (Emps).  Each  Emps  relation  has  three 
attributes,  employee  number  (eno),  employee  name  (encme),  and  salary  (sal). 
Recall  that  for  notational  simplicity,  we  eliminate  the  set  braces  which  normally 
would  occur  around  each  Emps  relation  in  Figure  8-2.  Consider  a  query  to  get 


department  names  and  the  employees  in  each  department  making  more  than 
$35,000.  First  consider  getting  all  employees.  The  query  is 


SELECT  dname,  (  SELECT  ALL 

FROM  Emps) 

FROM  Company 

Now  to  limit  employees  to  those  making  more  than  $35,000,  it  is  a  simple 

matter  of  adding  a  WHERE  clause  to  the  nested  SFW-expression,  giving 

SELECT  dname,  (  SELECT  ALL 
FROM  Emps 
WHERE  sal  >  35000) 

FROM  Company 

or 

SELECT  dname,  (Emps  WHERE  sal  >  35000) 

FROM  Company 

Note  that  a  SFW-expression  may  produce  an  empty  relation.  In  the  last  query 

the  “Personnel”  tuple  will  have  an  empty  Emps  relation  since  it  was  empty  to 

begin  with,  and  the  “Retail”  tuple  will  have  an  empty  Emps  relation  since  none 

of  its  employees  satisfy  the  “sal  >  35000”  predicate.  To  eliminate  tuples  with 

empty  Emps  relations  in  the  result  we  can  write 

SELECT  dname,  (Emps  WHERE  sal  >  35000) 

FROM  Company 

WHERE  EXISTS(Emps  WHERE  sal  >  35000) 

Later,  we  introduce  a  technique  for  referencing  the  new  Emps  relation  the  first 
time  it  is  mentioned,  avoiding  the  duplicate  specification  of  the  nested  query. 

Not  only  can  we  select  specific  tuples  from  a  nested  relation  we  can  also 
select  specific  attributes.  To  illustrate,  consider  the  query  to  get  department 
names  and  locations  and  employee  names  and  salaries.  We  write 
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SELECT  dname,  loc,  (SELECT  ename,  sal 

FROM  Emps) 

FROM  Company 


Combining  the  above  techniques,  we  can  write  the  following  query.  Get  depart¬ 
ment  names  and  locations,  and  employee  names  and  salaries  where  the  location 

is  ‘Austin’  and  the  employee  salary  is  more  than  $35,000. 

SELECT  dname,  loc,  (  SELECT  ename,  sal 

FROM  Emps 
WHERE  sal  >  35000) 

FROM  Company 

WHERE  loc  =  ‘Austin’ 

8.1.2  Functions 

In  SQL,  the  argument  to  a  function  such  as  SUM  is  a  column 
of  scalar  values  and  the  result  is  a  single  scalar  value;  hence, 
orthogonality  dictates  that  (a)  any  column-expression  should 
be  permitted  as  the  argument,  and  (b)  the  function-reference 
should  be  permitted  in  any  context  in  which  a  scalar  can 
appear.  However,  (a)  the  argument  is  in  fact  specified  in 
a  most  unorthodox  maimer,  which  means  in  turn  that  (b) 
function  references  can  actually  appear  only  in  a  very  small 
set  of  special-case  situations  [Datl;  20]. 


Date’s  arguments  are  even  more  valid  when  we  assume  a  -ilNF  model. 
Here  we  have  built-in  sets  of  values  in  the  form  of  nested  relations  and  it  would 
make  more  sense  to  apply  functions  to  relations  rather  than  artificially  applying 
them  to  attributes.  Then,  by  the  principle  of  orthogonality,  we  should  be  able 
to  apply  functions  to  any  expression  that  evaluates  to  a  relation. 

Consider  first  the  INF  database  of  Figure  8-1,  and  a  query  to  find  the 
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total  amount  made  by  all  employees.  In  SQL,  we  would  write 

SELECT  SUM(sal) 

FROM  Emp 

The  argument  to  SUM  is  actually  the  entire  sal  column  of  Emp ,  whereas  a 
reference  to  sal  in  a  WHERE  clause  (e.g.,  sal  >  5000),  is  refering  to  individual 
sal  values.  Therefore,  we  adopt  Date’s  suggestion  to  apply  functions  to  their 

actual  argument.  Thus,  our  query  becomes 

SUM(SELECT  sal 
FROM  Emp) 

Another  example  is  the  query  which  gets  all  departments  that  employ  more 
than  10  people: 

SELECT  dno 

FROM  Dept 

WHERE  COUNT (SELECT  * 

FROM  Emp 

WHERE  Dept. dno  =  Emp.dno) 

>  10 


or  using  our  simplified  notation  which  substitutes  “Emp”  for  “SELECT  *  FROM 
Emp”: 

SELECT  dno 
FROM  Dept 

WHERE  C0UNT(Emp  WHERE  Dept.dno  =  Emp.dno)  >  10 

In  SQL,  the  latter  query  would  usually  be  formulated  using  GROUP  BY  and 
HAVING. 
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SELECT  dno 
FROM  Dept 
WHERE  dno  IN 

(SELECT  dno 
FROM  Emp 
GROUP  BY  dno 

HAVING  COUNT(DISTINCT  eno)  >  10) 


GROUP  BY  introduces  a  new  structure  into  the  relational  model:  partitioned 
relations.  The  only  attributes  which  can  be  selected  from  a  partitioned  relation 
are  the  “group  by”  attributes,  i.e.,  those  that  have  the  same  value  for  each 
partition,  and  single-valued  functions  of  any  attribute.  Normally,  a  function 
operates  on  the  entire  relation,  but  when  a  relation  is  partitioned,  the  function 
is  applied  separately  to  each  partition.  Thus,  we  have  a  new  structure  which, 
incidentally,  is  not  in  INF,  with  new  rules  for  the  execution  of  SFW-expressions, 
and  a  new  HAVING  clause  to  test  predicates  on  partitions. 


In  some  cases  GROUP  BY  and  HAVING  are  not  necessary.  An  example 
of  this  is  when  the  values  of  the  functions  are  not  being  retrieved  in  a  SELECT 

clause.  For  example,  a  legal  SQL  query  to  do  the  last  query  is 

SELECT  dno 
FROM  Dept 

WHERE  10  <  (SELECT  C0UNT(*) 

FROM  Emp 

WHERE  Emp. dno  =  Dept.dno) 


Furthermore,  if  we  allow  nested  queries  in  the  SELECT  clause  then  GROUP  BY 
and  HAVING  are  totally  unnecessary.  For  example,  to  retrieve  the  counts  of 
employees  for  each  department  we  could  write 
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SELECT  dno,  C0UNT(Emp  WHERE  Emp.dno  =  Dept.dno) 

FROM  Dept 

Now  let  us  consider  the  -ilNF  database  in  Figure  8-2.  Since  employees 
have  already  been  “grouped  by”  department,  our  queries  are  easier  to  formulate. 

To  get  the  employee  counts,  we  write 

SELECT  dno,  COUNT(Emps) 

FROM  Company 

To  get  departments  where  the  employee  count  is  more  than  10,  we  write 

SELECT  dno 

FROM  Company 

WHERE  COUNT(Emps)  >  10 

By  structuring  relations  appropriately,  we  can  turn  any  GROUP  BY/HAVING 
query  into  a  straightforward  SFW-expression.  Since  these  types  of  queries 
are  some  of  the  hardest  to  formulate  in  SQL,  and  operate  under  a  different  set 
of  rules  from  standard  SQL  queries,  their  elimination  is  a  major  advantage  of 
the  -i INF  model. 

A  further  advantage  of  using  relations  or  nested  expressions  as  input 
to  functions  is  the  ability  to  use  multi-attribute  relations  and  have  the  function 
apply  to  several  attributes  simultaneously.  For  example,  suppose  we  have  a 
Salts  relation  with  employee  number  (eno)  and  12  sales  attributes  (Jan-sales, 
Ftb-salts,  ...,  Dec-sales )  showing  total  sales  for  each  month  of  the  year  for  the 
employee.  Then  to  get  the  total  of  all  sales  in  each  month  we  can  write 
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SUM(SELECT  Jan-sales,  Feb-sales,  Mar-sales,  Apr-sales,  May-sales,  Jun-sales, 
Jul-sales,  Aug-sales,  Sep-sales,  Oct-sales,  Nov-sales,  Dec-sales 
FROM  Sales) 

The  SUM  function  is  applied  to  each  column  of  the  argument  relation.  In  general, 
a  column  function,  (SUM,  AVG,  MAX,  MIN),  reduces  a  relation  to  a  single  tuple 
with  the  same  number  of  attributes,  by  applying  the  function  to  each  column 
of  the  relation.  A  table  function  (COUNT),  reduces  a  relation  to  a  single  tuple 
with  one  attribute.  Thus,  the  result  of  applying  a  function  is  always  a  single 
tuple  relation. 

8.1.8  Null  Values  and  Operations  Dealing  with  Nulls 

One  question  that  usually  arises  when  dealing  with  functions  concerns  the  pres¬ 
ence  of  null  values.  SQL  makes  the  decision  to  ignore  null  values  in  all  func¬ 
tions  except  COUNT.  An  unfortunate  consequence  of  this  is  that  the  equality  of 
AVG(i2e/)  *  COUNT (Rel)  and  SUM(iZeZ)  may  be  violated.  We  believe  that  nulls 
should  not  be  ignored  in  any  function,  rather  they  should,  when  appropriate, 
produce  an  error.  This  forces  the  user  to  remove  the  nulls  before  applying  the 
function  and  aiso  prevents  him  from  believing  he  has  received  a  precise  answer 
to  a  query  which  is,  in  fact,  based  on  imprecise  data. 

A  thorough  treatment  of  nulls  for  ->1NF  databases  was  found  in  Chap¬ 
ter  7.  We  saw  that  one  of  the  functions  which  is  usually  required  when  dealing 
with  null  values  is  a  method  for  eliminating  subsumed  tuples  in  a  relation. 
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In  SQL/NF  we  allow  a  single  atomic  null-value,  denoted  NULL.  A  tuple  t  is 
subsumed  by  tuple  q  if  t's  non-null  attributes  have  the  same  values  as  the  cor¬ 
responding  attributes  in  q.  For  example,  the  tuple  t  =<  Smith,  NULL,  NULL  > 
is  subsumed  by  <  Smith,  10,  NULL  ^  and  by  ^  Smith,  20, 15000  but  not  by 
<  Jones,  NULL,  15000  >.  Subsumed  tuples  are  like  duplicate  tuples  in  that  they 
do  not  provide  any  more  information  than  some  other  tuple  in  the  relation. 
When  nested  relations  are  attributes  the  definition  is  applied  recursively,  so 
that  relation  r  subsumes  relation  s,  if  every  tuple  in  s  is  subsumed  by  some 
tuple  in  r. 

Although  SQL  eliminates  duplicate  tuples  via  the  SELECT  DISTINCT 
construct,  it  does  not  eliminate  subsumed  tuples,  even  though  null  values  are 
allowed.  Therefore,  we  introduce  the  SUBSUME  function  to  eliminate  subsumed 
tuples  from  a  relation.  Note,  that  SUBSUME  also  removes  duplicate  tuples,  since 
by  definition  if  t  =  q  then  t  subsumes  q  and  q  subsumes  t.  We  also  use  our 
standard  notation  for  applying  a  function  for  the  syntax  of  DISTINCT  and  SUB¬ 
SUME. 

To  eliminate  duplicates  from  the  Company  relation  we  use 
DISTINCT(Company) 


To  get  department  names  and  employees  names  and  salaries,  eliminating  sub¬ 
sumed  employee  tuples,  we  use 
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SELECT  dname,  SUBSUME(SELECT  ename,  sal 

FROM  Emps) 

FROM  Company 

If  we  want  to  eliminate  duplicates  before  counting  the  number  of  tuples  in  the 
Company  relation,  our  query  is 

COUNT(DISTINCT(Company)) 

Another  important  operation  which  becomes  available  when  null  val¬ 
ues  are  supported  is  the  outer  join .  The  outer  join  is  similar  to  a  traditional 
join,  except  that  tuples  which  normally  would  not  participate  in  the  join  are 
added  to  the  result.  Null  values  are  used  for  the  attributes  not  in  the  relation. 
In  [Dat3],  a  proposal  is  made  for  supporting  the  outer  join  operation  with  a 
PRESERVE  clause.  All  tuples  of  the  relations  specified  in  the  PRESERVE  clause 
are  included  in  the  resulting  relation  even  if  they  do  not  satisfy  the  predicates 
of  the  WHERE  clause.  The  attributes  of  the  resulting  relation  which  are  not  in 
the  “preserved”  relation  are  set  to  NULL  for  those  tuples  which  did  not  satisfy 
the  WHERE  clause. 

For  example,  to  join  the  Dept  and  Emp  relations  in  our  INF  database, 
without  losing  the  department  data  for  departments  that  do  not  have  any 

employees,  we  would  use  the  PRESERVE  clause  as  follows. 

SELECT  * 

FROM  Dept,  Emp 

WHERE  Dept.dno  =  Emp.dno 

PRESERVE  Dept 


Finally,  a  clarification  of  the  relationship  between  empty  relations  and 
null  values  is  in  order.  For  reasons  discussed  in  Chapter  7,  we  note  that  the 
empty  relation  is  equivalent  to  any  relation  in  which  all  attributes  of  all  tuples 
have  null  values  for  the  atomic  attributes  and,  recursively,  empty  relations  for 
the  nested  relations.  Under  subsumption,  all  of  these  relations  are  equivalent 
to  a  single  tuple  relation,  where  the  value  of  each  attribute  is  null  or  empty. 
Since  we  have  a  single  type  of  null  in  SQL/NF,  we  assume  the  most  general 
interpretation,  that  is,  the  no-information  interpretation.  This  means  we  do 
not  know  whether  or  not  an  actual  value  exists  which  could  replace  this  null. 
Since  empty  relations  are  equivalent  to  a  relation  with  null-tuples,  we  assign 
the  no-information  interpretation  to  empty  relations  as  well. 

8.1.4  Miscellaneous  Features 

8. 1.4.I  Unnesting  after  a  Function 

When  a  column  or  table  function  is  applied  to  a  nested  relation  it  doesn’t  make 
sense  to  retain  the  relation  structure  for  a  single  tuple.  Therefore,  our  functions 
will  cause  the  relation  in  which  it  occurs  to  be  unnested  one  level.  For  example, 
instead  of  the  result  of  our  query  to  get  department  numbers  and  the  number 
of  employees  in  each  department  having  tuples  {<  10,  {3}  >,<  20,  {0}  >,< 
30, {1}  >,...},  we  would  have  {<  10,3  >,<  20,0  >,<  30,1  >,...}.  This 
feature  also  allows  easier  application  of  multiple  functions.  For  instance,  to  get 


the  total  number  of  employees  in  the  company  from  our  ->1NF  database  we 
would  write 

SUM(SELECT  COUNT(Emps) 

FROM  Company) 

Without  the  COUNT  function  unnesting  its  sets  the  SUM  function  would  get  sets 
of  counts  as  arguments  and  would  not  work  properly. 

8.1.4-2  Attribute  Lists 

Sometimes  it  is  easier  to  list  the  attributes  you  do  not  want  to  deal  with.  For 
this,  SQL/NF  allows  the  construct  “ALL  BUT  attribute-lisf .  Recall  a  previous 
example  in  which  we  were  interested  in  getting  the  total  sales  in  each  month 
from  a  Sales  relation  with  employee  number  and  12  sales  attributes,  one  for 
each  month.  In  that  query  we  had  to  list  all  12  sales  attributes,  when  it  would 
be  much  easier  to  list  the  one  attribute  we  were  not  interested  in,  eno.  Our 
query  then  becomes 

SUM(SELECT  ALL  BUT  eno 
FROM  Sales) 

8.1. 4. S  Don’t  Care  Values 

When  comparing  constant  values  with  attributes  values  in  a  “don’t  care”  value 
is  useful  for  making  wild  card  comparisons.  Our  “don’t  care”  value  is  the 
question  mark  (?).  To  illustrate  its  use,  consider  the  query  to  get  Company 
tuples  where  one  of  the  employees  has  name  “Smith”  and  salary  $20,000.  One 
way  to  write  this  query  is  to  look  for  an  employee  tuple  with  “ename  =  ‘Smith’, 
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“sal  =  20000”,  and  any  value  for  tno.  We  can  use  our  “don’t  care”  value  as 
follows 

Company  WHERE  <?,  “Smith",  20000 >  IN  Emps 

This  is  certainly  more  straightforward  than  the  alternative 

Company  WHERE  EXISTS 

(Emps  WHERE  ename  =  “Smith” 

AND  sal  =  20000) 

Various  other  text  matching  facilities  could  be  incorporated.  RDL 
has  a  proposed  text  matching  facility  based  on  the  SQL  “LIKE”  predicate,  and 
much  of  Schek’s  work  (cf.  [Schl])  has  been  involved  with  text  retrieval  in  a 
database  system. 

8.1.5  Data  and  Relation  Restructuring  Operations 

Two  operations,  NEST  and  UNNEST  are  provided  for  restructuring  relations  into 
either  more  or  less  nested  forms.  One  operation,  ORDER,  rearranges  the  tuples 
of  a  relation. 


The  restructuring  operations  correspond  to  the  nest  and  unnest  oper¬ 
ators  of  the  ->1NF  relational  algebra.  The  syntax  of  these  operators  is 

NEST  [query)  UNNEST  [query) 

ON  attribute-list  [AS  name]  ON  attribute-list 


In  the  following,  let  Rel  be  the  relation  formed  by  [query).  The  NEST  oper¬ 
ation  partitions  Rel  on  the  attributes  not  specified  in  the  attribute-list.  For 
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each  partition,  a  new  tuple  is  created  with  the  values  of  the  attributes  in  the 
attribute-list  collected  into  a  new  nested  relation.  The  nested  relation  is  given 
an  optional  name  but,  if  not  named,  it  cannot  be  referenced  any  place  else  in 
the  query.  Note  that  if  any  nested  relation  formed  consists  solely  of  tuples  in 
which  every  attribute  has  a  value  which  is  null  or  the  empty  relation,  then  the 
value  of  this  nested  relation  is  the  empty  relation. 

Let  us  go  through  a  step  by  step  building  of  a  query  to  convert  the 
INF  database  of  Figure  8-1  to  the  -<1NF  database  of  Figure  8-2. 

First  we  will  nest  the  employee  relation  by  collecting  eno ,  ename,  and 

sal  into  a  nested  relation  called  Emps. 

NEST  Gmp 

ON  eno,  ename,  sal  AS  Emps 

Next,  we  join  this  relation  with  the  Dept  relation  on  dno  and  eliminate 

one  of  the  duplicate  dno  columns. 

SELECT  ALL  BUT  Emp.dno 
FROM  Dept,  (NEST  Emp 

ON  eno,  ename,  sal  AS  Emps) 

WHERE  Dept. dno  =  Emp.dno 

This  query  produces  all  tuples  in  the  Company  relation  where  departments 
have  employees.  If  we  want  to  also  include  the  departments  which  do  not 
have  employees,  assigning  a  null  tuple  to  the  nested  Emps  relation,  we  need  to 
preserve  the  Dept  relation.  The  final  query  is 
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SELECT  ALL  BUT  Emp.dno 
FROM  Dept,  (NEST  Emp 

ON  eno,  ename,  sal  AS  Emps) 

WHERE  Dept.dno  =  Emp.dno 
PRESERVE  Dept 

The  UNNEST  operation  creates  several  tuples  for  each  tuple  in  Rel,  by 
concatenating  the  attributes  not  specified  in  the  attribute-list  with  a  tuple  from 
each  of  the  attributes  that  is  specified.  The  attributes  of  the  unnested  relations 
now  become  attributes  of  Rel.  Note  that  an  empty  relation  unnests  to  a  single 
tuple  with  null  values  for  each  atomic  attribute  and  an  empty  relation  for  each 
nested  relation. 


To  unnest  the  Company  relation  we  write 

UNNEST  (Company) 

ON  Emps 


To  convert  the  -'INF  database  to  the  INF  database  we  issue  a  query  for  each 

relation.  To  get  the  Emp  relation  we  write 

SELECT  eno,  ename,  dno,  sal 
FROM  (UNNEST  (Company) 

ON  Emps) 

WHERE  eno  IS  NOT  NULL 


and,  to  get  the  Dept  relation  we  write 

SELECT  dno,  dname,  loc 
FROM  Company 

In  SQL,  the  ORDER  BY  clause  is  added  to  a  query  if  tuples  are  to  be 
sorted  in  some  particular  order  before  being  output  to  the  user.  We  retain  this 


function,  but  modify  its  syntax  to  match  the  other  functions  in  our  language. 

The  new  syntax  is 

ORDER  (query) 

BY  name  [ASC  |  DESC]  {,  name  [ASC  |  DESC]} 

To  sort  the  Company  relation  into  ascending  order  by  location,  we 

write 

ORDER  Company 
BY  loc  ASC 

8.1.6  Name  Inheritance  and  Aliasing 

The  attributes  of  the  relations  formed  in  the  FROM  clause  of  a  SFW-expression 
may  be  used  in  several  places  in  a  query.  They  may  be  referenced  directly  in 
(1)  the  SELECT  clause,  (2)  the  FROM  clause  of  a  nested  SFW-expression,  or  (3) 
the  WHERE  clause.  Attributes  may  be  referenced  also  in  the  WHERE  clause  of 
any  nested  SFW-expression.  A  problem  occurs  when  attribute  names  are  not 
unique.  This  can  be  due  to  the  need  to  use  multiple  copies  of  a  relation  in 
a  single  query  or  to  the  presence  of  identical  names  in  different  relations,  or 
nested  relations. 

In  the  first  case,  when  we  need  to  use  multiple  copies  of  a  relation, 
the  solution  is  to  introduce  reference  names  for  the  relations.  For  example, 
the  query  to  get  all  pairs  of  department  names  that  exist  at  the  same  location 
requires  reference  name  for  the  Company  relation.  A  reference  name  is  specified 
by  including  the  key  word  AS  and  the  new  name. 


SELECT  First. dname,  Second. dname 

FROM  Company  AS  First,  Company  AS  Second 

WHERE  First.loc  =  Second. loc  AND  First.dno  <  Second.dno 

If  necessary,  or  desired,  reference  names  can  also  be  used  for  nested 
SFW-expressions  and  for  attribute  names.  If  a  single  relation  X  is  specified  in 
the  FROM  clause  of  a  SFW-expression  the  name  of  the  resulting  relation  defaults 
to  X.  However,  if  there  is  more  than  one  relation  in  the  FROM  clause,  then  there 
is  no  default,  and  a  reference  name  is  required  if  the  resulting  relation  is  going 
to  be  referenced  elsewhere  in  the  query.  Consider  the  last  query  to  get  pairs 
of  department  names.  Let  us  use  the  result  of  this  query  to  get  all  triples  of 
department  names  at  the  same  location,  and  rename  the  attributes  to  dnamel, 
dname 8,  and  dnameS.  We  will  use  the  reference  name  Pairs  for  the  last  query 

and  use  PairsZ  for  a  new  reference  name  for  Pairs. 

SELECT  Pairs.First.dname  AS  dnamel,  Pairs.Second.dname  AS  dname2, 
Pairs2. Second. dname  AS  dname3 

FROM  (SELECT  First. dname,  Second. dname 

FROM  Company  AS  First,  Company  AS  Second 
WHERE  First.loc=Second.loc  AND  First. dno<Second.dno)  AS  Pairs, 
Pairs  AS  Pairs2 

WHERE  dname2  =  Pairs2.First.dname 

Note  how  reference  names  are  cascaded  when  necessary  to  distinguish 
attribute  names.  Similarly,  if  it  is  necessary  to  distinguish  identical  names  that 
occur  at  different  nesting  levels  of  a  relation  then  each  name  attribute  can  be 
prefixed  by  the  relation  name  of  the  nested  relation  in  which  it  occurs.  In 
addition,  unnesting  a  relation  via  the  UNNEST  operator  may  require  that  the 


unnested  relation's  name  be  attached  to  any  names  which  would  otherwise  be 
identical  in  the  resulting  relation. 
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Reference  names  are  useful  for  simplifying  queries  in  which  a  nested 

query  expression  is  used  in  several  places  in  the  query.  Recall  from  section  8.1.1 

the  query  to  get  department  names  and  employees  making  more  than  $35,000, 

eliminating  departments  with  no  employees  meeting  the  salary  requirement. 

Our  solution  then  was 

SELECT  dname,  (Emps  WHERE  sal  >  35000) 

FROM  Company 

WHERE  EXISTS(Emps  WHERE  sal  >  35000) 

By  using  a  reference  name  for  the  nested  query  on  Emps,  the  duplication  can 
be  eliminated,  as  follows: 

SELECT  dname,  (Emps  WHERE  sal  >  35000)  AS  Emps-rich 

FROM  Company 

WHERE  EXISTS  (Emps-rich) 

8.2  Data  Manipulation  Language 

In  this  section  we  discuss  commands  to  store,  modify,  and  erase  data  from 
relations  in  the  database.  These  commands  can  be  thought  of  as  functions 
which  transform  relations  into  other  relations  by  adding,  changing,  or  deleting 
data  from  them.  Just  as  an  SFW-expression  produces  relations  from  relations, 
the  DML  commands  perform  similarly  with  the  additional  effect  that  the  new 
relations  replace  the  old  relations  in  the  database.  Thinking  of  DML  commands 
as  functions  is  critical  if  we  want  to  apply  them  to  ->1NF  relations.  In  a  ->1NF 


model  we  will  need  to  manipulate  nested  relations  as  easily  as  we  manipulate 
traditional  relations. 

To  get  the  feel  for  our  syntax  (adapted  from  the  RDL  standard 

[X3H2]),  let  us  start  with  some  examples  on  the  INF  database  of  Figure  8-1. 

The  STORE  statement  can  be  used  to  add  user  specified  tuples  to  a  relation  or 

to  add  tuples  retrieved  via  a  query  specification  to  a  relation.  To  add  two  new 

departments  to  the  Dept  relation,  we  write 

STORE  Dept 

VALUES  <50,  Training,  Waco> 

<60,  Sales,  Austin> 

Suppose  we  had  a  relation  New- Dept  with  two  attributes,  deptno  and  deptname , 
which  contained  information  on  some  new  departments.  If  we  want  to  store 

this  data  in  the  Dept  relation  we  write 

STORE  Dept(dno,  dname) 

SELECT  ALL 
FROM  New-Dept 

For  each  of  the  New-Dept  tuples  stored  in  Dept ,  the  loc  attribute  will  be  set  to 
the  default  value  defined  for  loc  in  the  schema  definition,  or  NULL  if  no  default 
value  was  specified.  In  general,  an  arbitrary  SFW-expression  can  be  used  in 
the  STORE  command  to  specify  the  tuples  to  be  stored.  Of  course,  the  relation 
created  must  be  compatible  (number  of  attributes  and  domain  types)  with  the 
relation  being  stored  into. 

The  MODIFY  command  is  used  to  replace  values  with  others  in  the 


190 


database.  Suppose  we  want  to  give  every  employee  in  the  Emp  relation  a  10% 
raise.  We  would  write 

MODIFY  Emp 

SET  sal  =  sal  *  1.1 


If  we  want  to  limit  the  raise  to  those  employees  in  department  10,  we  add  a 

WHERE  clause  to  the  query  as  follows: 

MODIFY  Emp 

SET  sal  =  sal  *  1.1 
WHERE  dno  =  10 


In  general,  we  can  specify  more  than  one  replacement  in  the  SET  clause,  and 
qualify  the  tuples  to  be  modified  via  an  arbitrary  predicate  in  the  WHERE  clause. 

The  ERASE  command  is  used  to  delete  tuples  from  relations.  An  op¬ 
tional  WHERE  clause  is  used  to  identify  the  tuples  to  be  deleted.  Let  us  delete 

all  departments  with  no  employees. 

ERASE  Dept 

WHERE  NOT  EXISTS  (Emp  WHERE  Dept.dno  =  Emp.dno) 

All  three  DML  commands  operate  by  first  computing  all  changes, 

and  then  making  all  changes  to  the  relation  in  one  atomic  action.  This  way  the 

relation  being  changed  may  be  referenced  in  a  nested  SFW-expression  without 

fear  of  it  changing  while  the  command  is  being  executed.  For  example,  suppose 

we  want  to  delete  all  employees  whose  salary  is  greater  than  the  current  average 

salary.  The  appropriate  ERASE  command  is 
ERASE  Emp 

WHERE  sal  >  AVG(SELECT  sal  FROM  Emp) 


If,  instead  of  the  above  rule,  we  recalculated  the  average  salary  as  we  checked 
and  perhaps  deleted  each  tuple  in  Emp,  it  is  possible  to  wind  up  deleting  all 
tuples  in  Emp\ 

Now  let  us  focus  on  the  particular  problem  that  -tlNF  relations  pose 
for  our  DML  commands.  We  need  a  way  of  performing  the  three  DML  com¬ 
mands  on  individual  nested  relations.  No  matter  which  operation  we  perform 
on  a  nested  relation,  we  are  changing  only  the  relation  in  which  the  updated 
relation  is  nested.  Therefore,  all  changes  to  nested  relations  are  done  with  a 
MODIFY  command  on  the  database  relation.  For  the  next  set  of  examples  we  use 
the  -i INF  database  of  Figure  8-2.  Suppose  we  want  to  insert  a  new  employee, 
<32,  Samuels,  49000>,  working  in  department  10.  An  outline  of  the  required 
command  is 

MODIFY  Company 
SET  Emps  =  X 
WHERE  dno  =  10 

Since  Emps  is  a  nested  relation,  what  should  we  use  for  X  in  this  operation? 
Since  we  allow  any  atomic-valued  expression  to  be  used  when  the  attribute 
being  changed  is  atomic,  we  allow  any  relation  valued  expression  to  be  used 
when  the  attribute  being  changed  is  a  relation.  Thus,  one  legitimate  solution 
is  to  replace  X  with 

(Emps  UNION  <32,  Samuels,  49000>) 

Another,  more  general  solution  is  to  replace  X  with  a  “nested”  STORE  command 
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on  the  Emps  relation.  The  total  query  is  then 

MODIFY  Company 

SET  Emps  =  (STORE  Emps 

VALUES  <32,  Samuels,  49000>) 

WHERE  dno  =  10 


In  general,  we  can  use  UNION  instead  of  STORE  and  DIFFERENCE  instead  of 
ERASE,  however,  there  is  usually  not  a  good  way  to  simulate  MODIFY  using  a 
query  expression.  The  command  to  give  each  employee  in  department  10  that 

makes  more  than  $30,000,  a  10%  raise,  is  written 

MODIFY  Company 

SET  Emps  =  (MODIFY  Emps 

SET  sal  =  sal  *  1.1 
WHERE  sal  >  30000) 

WHERE  dno  =  10 


The  alternative  query  expression  for  the  nested  MODIFY  is  the  much  more  com¬ 
plex  expression: 

SELECT  eno,  ename,  sal  *  1.1 
FROM  Emps 
WHERE  sal  >  30000 
UNION 

Emps  WHERE  sal  <=  30000 


In  summary,  when  tuples  are  to  be  stored,  modified,  or  erased  from 
a  nested  relation,  either  a  query  expression  can  be  constructed  to  perform  the 
modification  or  the  appropriate  DML  command  can  be  used.  In  either  case, 
any  operation  on  a  nested  relation  is  done  within  a  MODIFY  command  on  the 
relation  containing  the  nested  relation. 
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8.3  The  SQL/NF  Data-Definition  Language 


In  standard  SQL,  it  is  possible  to  define  relations  using  the  CREATE  TABLE 
command,  and  views  using  the  DEFINE  VIEW  command.  As  part  of  the  CREATE 
TABLE  command,  the  user  specifies  the  attribute  names  and  the  domain  (e.g., 
integer,  character)  to  be  associated  with  each  attribute  of  the  relation.  In 
the  proposed  RDL  standard,  base  tables  and  views  are  defined  in  a  SCHEMA 
command,  which  includes  a  TABLE  command  for  each  base  table  being  defined, 
and  a  VIEW  command  for  each  view  being  defined.  In  addition  to  specifying  the 
attribute  names  and  their  domains,  a  variety  of  integrity  constraints  can  also 
be  specified  (UNIQUE,  NOT  NULL,  REFERENCES  ...,  CHECK  ...  J1. 


We  shall  adopt  the  RDL  framework  for  the  SQL/NF  data-definition 
language.  However,  we  shall  need  to  make  appropriate  modifications  to  allow 
for  definition  of  ->1NF  relations.  Let  us  first  show  the  definitions  for  the  INF 

database  in  Figure  8-1. 

SCHEMA 
TABLE  Dept 

ITEM  dno  INTEGER  UNIQUE  NOT  NULL 
ITEM  dname  CHARACTER  10 
ITEM  loc  CHARACTER  10 
TABLE  Emp 

ITEM  eno  INTEGER  UNIQUE  NOT  NULL 
ITEM  ename  CHARACTER  10 
ITEM  dno  INTEGER  REFERENCES  Dept.dno 
ITEM  sal  REAL 


1  See  [X3H2]  for  details  on  these  constraints. 
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Each  ITEM  command  defines  a  column  in  the  relation.  The  UNIQUE  constraint 
specifies  that  no  duplicates  are  allowed  for  the  attribute  (thus  forming  a  key 
for  the  relation).  The  NOT  NULL  constraint  specifies  that  no  null  values  are 
allowed  for  the  attribute,  and  the  REFERENCES  constraint  disallows  any  value 
for  the  attribute  that  is  not  a  value  in  the  referenced  column.  Note  that  these 
constraints  are  also  allowed  as  separate  clauses  in  a  TABLE  definition.  This  is 
especially  needed  when  two  or  more  attributes  are  to  be  key  for  a  relation  and 
their  combination  must  be  specified  as  UNIQUE  (see  the  Appendix  for  syntax). 

Following  the  principle  of  orthogonality,  in  order  to  define  ->1NF  relar 
tions,  we  must  allow  TABLE  definitions  wherever  an  atomic-valued  specification 

could  occur  before.  The  definitions  for  the  -ilNF  database  of  Figure  8-2  are: 

SCHEMA 

TABLE  Company 

ITEM  dno  INTEGER  UNIQUE  NOT  NULL 
ITEM  dname  CHARACTER  10 
ITEM  loc  CHARACTER  10 
ITEM  (TABLE  Emps 

ITEM  eno  INTEGER  UNIQUE 
ITEM  ename  CHARACTER  10 
ITEM  sal  REAL) 


In  order  to  simplify  the  definition  of  nested  schemes,  we  allow  for 
the  definition  of  relation  schemes  separately  from  the  definition  of  the  relations 
themselves.  This  option  is  analogous  to  the  option  in  Pascal  of  defining  the  type 
of  a  variable  directly,  or  by  using  a  user-defined  type.  Therefore,  we  introduce 
the  SCHEME  command,  which  can  be  used  to  specify  table  definitions  without 


actually  creating  a  table.  This  command  is  especially  useful  when  deeply  nested 
relations  are  being  defined  or  when  the  same  nested  relation  scheme  is  to  appear 
in  more  than  one  place. 

The  formal  definitions  for  our  sample  corporation  database  follow. 
SCHEME 

TABLE  PARTSET 

ITEM  part  INTEGER  UNIQUE  NOT  NULL 
TABLE  PERSON 

ITEM  name  CHARACTER  10  UNIQUE 
ITEM  dob  CHARACTER  8 
TABLE  EMPLOYEE 
ITEM  empno  INTEGER  UNIQUE 
ITEM  name  CHARACTER  10 
ITEM  sal  REAL 

ITEM  mgr  INTEGER  REFERENCES  EMPLOYEE.empno 
ITEM  (TABLE  Children  PERSON) 

SCHEMA 

TABLE  Corp 

ITEM  dno  INTEGER  UNIQUE 
ITEM  dname  CHARACTER  10 
ITEM  loc  CHARACTER  10 
ITEM  (TABLE  Emp  EMPLOYEE) 

ITEM  (TABLE  Usage  PARTSET) 

TABLE  Supply 

ITEM  supplier  INTEGER  UNIQUE 
ITEM  (TABLE  Supplies  PARTSET) 


One  noticeable  absence  from  our  language  is  the  SQL  CREATE  INDEX 
command.  Indices  are  in  the  realm  of  physical  database  access  concerns  and 
should  not  be  a  user  specified  option.  Unfortunately,  in  SQL,  this  command  is 
also  the  means  used  to  specify  the  UNIQUE  constraint  on  attributes.  In  SQL/NF, 
this  constraint  has  been  moved  to  its  rightful  place  in  the  schema  definitions 
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and  so  the  CREATE  INDEX  command  is  no  longer  necessary  at  the  user  level. 

8.4  Comparison  with  Other  Languages 

In  this  section,  we  look  at  other  database  languages  which  have  been  developed 
to  deal  with  databases  that  are  not  based  on  the  standard  INF  model.  We  only 
briefly  mention  non-SQL-like  languages  and  provide  a  more  detailed  comparison 
of  the  SQL-like  languages. 

Non-SQL-like  languages  include  those  developed  for  functional  data 
models  [Zan5,  Shi]  and  those  developed  from  a  “Query-by- Example”  model 
[JW,  Hsi].  The  GEM  language  [Zan5]  is  a  derivative  of  QUEL  which  works  on 
a  semantic  data  model  of  the  Entity-Relationship  type.  The  DAPLEX  language 
[Shi]  uses  an  English-like  syntax  which  works  on  a  functional  data  model.  Both 
GEM  and  DAPLEX  use  a  functional  composition  notation  to  relieve  users 
of  explicitly  specifying  joins.  This  composition  is  explicitly  represented  in  the 
-<INF  data  model  with  the  use  of  nested  relations.  GEM  allows  single  attributes 
to  be  set- valued  one  level  deep.  For  example,  said  attribute  color  may  have  value 
{green}  or  {yellow,  red}.  This  corresponds  to  limiting  rules  in  our  model  to  the 
form  R  =  (Ait  Aj, . . . ,  An)  where  each  A,  is  either  zero  order  or  a  higher  order 
attribute  with  associated  rule  A<  =  (B)  where  B  is  zero  order.  Neither  GEM 
nor  DAPLEX  supports  explicit  nesting  or  unnesting  of  set-valued  attributes, 
however,  each  retains  a  version  of  the  SQL  GROUP  BY  operation  for  executing 
aggregate  functions. 


The  language,  Unified  Query-By-Example  (UQBE)  [Hsi],  is  based  on 
Jacobs’  database  logic  [Jacl]  and  the  functional  data  model.  UQBE  queries 
are  translated  into  either  QBE  or  a  Functional  Query  Language  which  can  be 
translated  into  other  languages  like  QUEL  and  SEQUEL.  Jacobs’  own  QBE- 
like  language,  Generalized  Query-By-Example  (GQBE)  [JW],  is  based  strictly 
on  database  logic.  These  languages  operate  on  -<1NF  relations,  however,  the 
two-dimensional  format  is  quite  different  from  a  SQL-like  language,  so  direct 
comparison  is  not  made.  In  fact,  Jacobs  has  defined  a  Generalized  SQL  (GSQL) 
language  [Jac2]  with  power  similar  to  the  QBE-like  languages.  We  will  look  at 
GSQL  later  in  this  section. 

One  SQL-like  language  which  also  uses  a  form  of  functional  composi¬ 
tion  is  SQL/N  [Bra].  SQL/N  is  upwardly  compatible  with  SQL  and  provides 
“natural  language”  quantifiers,  like  “FOR  ALL”  and  “THERE  IS  1",  for  join¬ 
ing  relations  over  common  attributes.  “PARENT”  and  “CHILD”  relationships 
between  tuples  are  based  on  the  foreign  key  concept.  As  we  mentioned  above, 
the  -'INF  model  allows  explicit  representation  of  these  relationships  with  the 
use  of  nested  relations. 

In  the  rest  of  this  section,  we  provide  more  detailed  comparisons  of 
SQL/NF  with  two  languages  designed  for  -ilNF  databases,  GSQL  and  the  data¬ 
base  language  being  developed  at  IBM  Heidelberg  for  “NFJ”  relations  [PHH, 
PT,  SP].  Figure  8-3  shows  some  example  queries,  written  in  SQL/NF,  GSQL, 


SQL/N F 

1.  SELECT  dnam*, 

(SELECT  «no,  (name 
ntH  Empc 
muni  >  55000) 

nOM  Company 

1.  SKLKCt  dno,  CQUn(Empa) 
nOH  Company 

S.  IBI  Company 

01  ALL  BUT  loo  It  Dapta 


4.  UIIKS I  Company 
01  Empa 


5.  BZLXCI  dnama,  an  am* 
riOM  (UHE5I  Company 
01  Empa) 


GSQL 

X.  SELECT  (dnama,  E)  AS  DE 
(•no,  anama)  AS  E 
rtOM  Company 
VSKEE  aal  >  35000 


3.  No  translation  known 


3.  SELECT  (loe,  Dapta) 

(dno,  dnama,  Empa)  AS  Dapta 
ROM  Company 


4.  SELECT  dno,  dnama,  loe, 

ano,  anama,  aal 
FtON  Company 

5.  SELECT  dnama,  anama 
7E0M  Company 


NF» 

I.  SELECT  {X. dnama, 

(SELECT  {  XX .ano,  XX.enam«)> 
FMM  XX  XI  X.Erapa 
VHEEX  XX. aal  >  35000)  )> 
non  X  XI  Company 

3.  SELECT  {X.dno,  CAID(X.Emps)}> 
non  X  II  Company 

3.  SELECT  {  X.loc, 

(SELECT  {  Y.dno,  Y.dname, 
Y.Empa)> 

FEOM  Y  II  Company 
■HEIE  Y.loc  =  X.loc)  )> 

FEOM  X  II  Company 

4.  SELECT  {  X.dno,  X. dnama,  X.loc, Y^ 
nOM  X  II  Company, 

Y  II  XJSmpa 

5.  SELECT  {  Y.dname,  Y.ename^ 

RON  X  II  Company, 

Y  II  X.Empa 


Queries: 

1.  Get  department  names  and  employee  numbers  and  names  for  all  employees 
making  more  than  $35,000. 

2.  Get  department  numbers  and  the  number  of  employees  in  each  department. 

3.  Create  a  new  nested  relation  called  Depts  which  contains  all  departments 
in  each  location. 

4.  Get  the  INF  version  of  Company. 

5.  Get  pairs  of  department  names  and  employee  names  where  the  employee 
works  in  the  department. 

Figure  8-3.  Five  sample  queries  written  in  SQL/NF,  GSQL, 
and  NF*  Query  Language. 


and  the  NF*  query  language. 

i 


8-4-1  Generalized  SQL 


The  GSQL  language  is  a  generalization  to  database  logic  of  relational  SQL. 
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Nested  relations  are  called  clusters.  GSQL  does  not  support  nested  SFW- 
expressions  in  the  FROM  or  WHERE  clauses.  All  WHERE  clause  predicates,  whether 
they  apply  to  a  nested  relation  or  not,  are  included  in  the  single  WHERE  clause 
of  each  query.  If  a  predicate  references  an  atomic  attribute  of  the  database 
relation  then  entire  tuples  are  selected  or  rejected,  however,  if  the  attribute 
is  in  a  nested  relation  then  tuples  from  that  nested  relation  are  selected  or 
rejected.  In  the  SELECT  clause,  attributes  may  be  included  from  anywhere  in 
the  relation.  If  some  attributes  are  from  nested  relations,  they  are  unnested 
appropriately.  The  attributes  selected  can  be  renested  in  an  arbitrary  way  by 
specifying  clusters  in  the  SELECT  clause  (see  query  1  in  Figure  8-3.)  Functions 
may  be  specified  as  in  standard  SQL,  however,  no  support  for  GROUP  BY  is 
mentioned  in  [Jac2j. 

The  major  disadvantage  of  GSQL  is  its  extreme  lack  of  orthogonality, 
as  witnessed  by  the  lack  of  nested  SFW-expressions  in  the  SELECT  and  FROM 
clauses,  and  the  hidden  unnesting  that  goes  on  when  attributes  are  selected.  Of 
course,  the  same  problems  with  functions  in  SQL,  are  present  in  GSQL,  since 
there  is  no  change  from  SQL  in  this  area. 

8.4>2  Query  Language  for  NP  Relations 

The  NF3  query  language  has  syntax  and  properties  that  are  similar  to  SQL/NF. 
In  [PT],  some  of  the  query  facilities  are  described  and  were  used  to  generate 
the  example  queries  in  Figure  8-3.  The  language  includes  many  more  built  in 


functions  than  standard  SQL,  including  several  functions  to  work  with  a  “list” 
data  structure.  They  also  retain  the  “GROUP  BY”  function  and  also  include 
an  inverse  operation  “DUNION.”  There  is  a  new  syntax  for  “GROUP  BY” 


which  aligns  it  with  the  syntax  of  our  NEST  and  UNNEST  functions: 

GROUP  reference-name  IN  relation-name 
BY  reference-name. attribute-name 

There  is,  however,  no  indication  of  how  this  new  grouped  relation  is  used  in  a 
query,  particularly  in  applying  aggregate  operators  to  the  groups. 

Although  present  in  an  earlier  draft  of  the  language  [PHH],  the  latest 
report  on  the  NF2  query  language  in  [PT]  does  not  include  nest  and  unnest 
functions.  Nesting  can  be  simulated  using  a  nested  expression  in  the  select 
clause  (see  NF2  query  3),  however,  it  is  very  cumbersome  to  specify.  Unnesting 
can  also  be  done  using  nested  expressions,  but  [PT]  recommends  a  “hierarchical 
join”  operation  as  used  in  NF2  queries  4  and  5.  According  to  our  principle  of 
orthogonality,  a  distinct  unnest  operator  should  be  used  rather  than  loading 
the  join  operation  with  a  new  function. 
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Chapter  9 

A  New  Approach  to  Nested  Normal  Form 


In  Chapter  3,  we  briefly  described  a  normal  from  for  ->1NF  relations,  called 
mated  normal  form  (NNF),  which  was  first  introduced  in  [OYl].  [OYl]  gives 
an  algorithm  to  obtain  an  NNF  decomposition  of  a  set  of  attributes  U  with 
respect  to  a  set  of  MVDs  M.  The  decomposition  explicitly  represents  a  set  of 
full  and  embedded  MVDs  implied  by  M,  and  is  a  faithful  and  nonredundant 
representation  of  U.  NNF  relations  are  better  than  relations  with  the  PNF 
property,  since  NNF  implies  PNF  and  eliminates  also  partial  and  transitive 
dependencies  which  appear  in  the  relations. 

The  algorithms  given  in  [OYl]  to  produce  NNF,  use  as  input  depen¬ 
dencies  a  set  of  MVDs  and  the  MVD  counterparts  of  a  set  of  FDs.  The  authors 
acknowledge  the  deficiency  which  this  approach  to  FDs  creates  in  the  design 
and  provide  a  framework  for  a  unified  approach  to  MVDs  and  FDs  in  [YO]. 
The  key  idea  is  to  modify  the  set  of  MVDs  which  are  used  as  input  to  the  de¬ 
composition  algorithm  so  the  different  semantics  of  the  FDs  are  appropriately 
accounted  for. 


Example  9.1:  Let  U  =  ELSC  and  D  =  {E-*-+S,E  —*  L},  where  E  is  an 
employee  id,  L  is  the  employee’s  location,  5  is  an  employee’s  skill,  and  C  is  an 
employee’s  child.  Using  the  method  of  [OYl],  we  would  use  the  MVD  E-+-+L 
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implied  by  E  —*  L  and  create  the  ->1NF  relation  with  scheme  tree  shown  in 
Figure  9-la. 


E  E,L 

/W  /\ 

L  S  C  SC 

(a)  (b) 

Figure  9-1.  Scheme  trees  for  Example  9.1  using  approach  (a)  of  [OYl],  and 
(b)  modified  for  different  FD  semantics. 

However,  since  E  —*  L,  each  L-set  created  by  this  scheme  will  be  a  singleton 
set.  Therefore,  we  should  use  the  scheme  tree  of  Figure  9-lb.  □ 

Although  an  approach  to  better  handling  FDs  in  an  NNF  design  is 
being  pursued  [Ozs],  the  introduction  of  embedded  MVDs  has  not  yet  been 
considered.  One  reason  for  this  is  that  the  implication  problem  for  EMVDs 
has  not  been  solved.  That  is,  given  a  finite  set  of  attributes  17,  there  is  no 
known  complete  axiomatization  of  EMVDs.  Furthermore,  as  we  mentioned  in 
Chapter  2,  if  the  set  of  attributes  is  infinite,  then  there  is  provably  no  complete 
axiomatization.  There  axe,  however,  several  sound  inference  rules  for  EMVDs 
and  for  EMVDs  together  with  MVDs  and  FDs.  Thus,  if  we  are  given  that 
an  EMVD  should  hold  in  our  database,  we  can  use  that  knowledge,  plus  any 
dependencies  derivable  from  the  known  inference  rules,  to  improve  our  database 


design.  One  of  the  contributions  of  our  new  approach  to  NNF  design  is  to 
include  EMVDs  in  the  set  of  input  dependencies. 

In  addition  to  including  EMVDs  in  the  design,  we  take  a  different 
approach  to  the  design  of  NNF  relations,  which  gives  the  designer  of  a  -<1NF 
scheme  more  control  over  the  final  outcome.  As  proved  in  [OYl],  the  design 
scheme  produces  a  unique  result  if  and  only  if  the  input  dependencies  are  con¬ 
flict  freef.  Furthermore,  we  are  guaranteed  that  the  path  set  of  the  scheme 
trees  created  is  in  4NF  only  if  the  input  dependencies  are  conflict  free.  Other¬ 
wise,  there  are  several  designs  which  will  satisfy  the  NNF  requirements,  not  all 
of  which  will  have  4NF  path  sets. 

In  the  approach  of  [OYl],  these  different  designs  result  by  using  differ¬ 
ent  selections  of  fundamental  keys  to  decompose  a  set  of  attributes  into  several 
branches  of  the  scheme  tree,  and  by  using  different  orderings  of  all  keys  to  test 
for  partial  and  transitive  dependencies  and  essential  dependents.  These  differ¬ 
ent  orderings  of  keys  may  cause  different  scheme  trees  to  be  split  apart.  The 
following  example  will  make  this  explanation  clearer. 

Example  9.2:  Consider  a  scouting  database  with  attributes  BSL  (boy  scout 
leader),  GSL  (girl  scout  leader),  Boy,  Girl,  Date  (when  a  Boy  and  Girl  went 
out  to  eat),  and  Dance  (when  a  Boy  and  Girl  went  to  a  dance).  The  depen¬ 
dencies  which  are  assumed  to  hold  in  this  database  are  BSL  — ►  Boy,  GSL 

f  Please  refer  to  sections  4  and  5  of  Chapter  2  for  the  explanation  and  definition  of 
many  terms  used  in  this  chapter  concerning  dependencies  and  normal  forms. 
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GSL 
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Boy 


GSL 


Girl 


BSL 


Girl 

Date 

Dance 

Boy 

Date 

Dance 

(a) 

(b) 

Figure  9-2.  Two  initial  scheme  trees  for  Example  9.2, 
using  (a)  BSL,  and  (b)  GSL  to  decompose. 

— »— *  Girl,  (Boy,  Girl)  — ►  Date,  and  (Boy,  Girl)  — *— ►  Dance.  We  also  have  the 

EMVD  0— *— ♦  BSL  )  GSL,  however  EMVDs  are  not  considered  in  the  approach 

of  [OY1].  Following  the  approach  of  [OYl]  we  create  an  initial  scheme  tree  by 

decomposing  the  entire  set  of  attributes  based  on  the  fundamental  keys  {  BSL, 

GSL,  (Boy,  Girl)}.  The  decomposition  algorithm  arbitrarily  chooses  either  of 

these  keys  to  perform  the  initial  decomposition,  and  depending  on  which  one 

is  chosen  quite  different  NNF  designs  result.  The  two  initial  decompositions 

which  result  from  using  BSL  or  GSL  are  shown  in  Figure  9-2. 


When  BSL  is  used  to  decompose  the  initial  scheme  tree,  the  tree  has  a 
partial  dependency  GSL  — ►  Girl,  and  so  the  edge  (GSL,  Girl)  is  removed  from 
Figure  9-2a,  and  a  new  tree  created  with  the  single  edge  (GSL,  Girl).  Similarly, 
when  GSL  is  used  to  decompose  the  initial  scheme  tree,  the  tree  has  partial 
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dependency  BSL  — *— ►  Boy,  and  so  the  edge  (BSL,  Boy)  is  removed  from  Figure 
9-2b,  and  a  new  tree  created  with  the  single  edge  (BSL,  Boy).  The  scheme 
trees  which  result  in  these  two  cases  have  4NF  path  sets  (BSL,  Boy),  (BSL, 
GSL,  Date),  (BSL,  GSL,  Dance),  and  (GSL,  Girl).  However,  in  neither  case  is 
the  particular  decomposition  very  intuitive.  Take  the  trees  which  result  from 
starting  with  BSL.  The  scheme  trees  show  that  for  each  boy  scout  leader  their 
is  a  set  of  boys  and  a  set  of  girl  scout  leaders.  And  for  each  girl  scout  leader 
associated  with  a  boy  scout  leader  there  is  a  set  of  Dates  and  a  set  of  Dances. 
Also  for  each  girl  scout  leader  there  is  a  set  of  girls.  The  relationship  between 
BSL,  GSL  and  Date  and  Dance  is  only  an  indirect  one  via  the  leaders  associated 
boys  and  girls.  Nevertheless,  these  two  schemes  are  the  ones  recommended  by 
[OYl|. 

The  other  alternative  is  to  use  (Boy,  Girl)  as  the  fundamental  key  to 
start  the  decomposition.  However,  this  choice  is  not  allowed  by  [OY l]  since  the 
MVDs  with  left  hand  sides  (Boy,  Girl)  are  split  by  the  other  given  MVDs,  and 
this  will  result  in  a  path  set  which  is  not  4NF.  However,  let  us  explore  what 
happens  if  we  do  use  this  fundamental  key,  primarily  to  compare  later  with 
results  achieved  by  our  design  approach  for  this  problem.  Using  (Boy,  Girl)  as 
the  fundamental  key  two  alternatives  for  the  initial  decomposition  are  shown 
in  Figure  9-3.  The  two  alternatives  result  from  a  decision  to  use  either  BSL  or 
GSL  as  the  fundamental  key  when  decomposing  node  (BSL,  GSL). 
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Boy,  Girl  Boy,  Girl 


BSL  Date  Dance  GSL  Date  Dance 

GSL  BSL 

(a)  (b) 

Figure  9-3.  Two  alternative  trees  using  (Boy,  Girl)  to  start  decomposition, 
and  using  (a)  BSL,  and  (b)  GSL  to  further  decompose. 

When  BSL  is  used  to  decompose  (BSL,  GSL),  we  find  the  partial  de¬ 
pendency  (BSL,  Girl)  -*-*  GSL  in  the  scheme  tree,  and  so  we  remove  edge 
(BSL,  GSL)  and  create  a  new  scheme  tree  with  edge  ((BSL,  Girl),  GSL).  Simi¬ 
larly,  if  GSL  is  used  to  decompose  (BSL,  GSL),  we  find  the  partial  dependency 
(GSL,  Boy)  — ►— >  BSL  in  the  scheme  tree,  and  so  we  remove  edge  (GSL,  BSL) 
and  create  a  new  scheme  tree  with  edge  ((GSL,  Boy),  BSL).  In  both  cases  the 
path  set  is  not  in  4NF.  In  the  first  case,  (Boy,  Girl,  BSL)  is  decomposable 
by  MVD  BSL  *■  Boy,  and  in  the  second  case,  (Boy,  Girl,  GSL)  is  decom¬ 
posable  by  MVD  GSL  — >— +  Girl.  However,  the  resulting  scheme  trees  are  in 
nested  normal  form.  In  these  cases,  the  initial  decomposition  seems  better  in 
that  we  have  the  Date  and  Dance  attributes  directly  associated  with  the  (Boy, 
Girl)  pair.  However,  the  additional  tree  that  is  created  to  solve  the  partial 


dependency  presents  quite  an  unintuitive  grouping  of  attributes.  □ 

In  our  approach,  the  design  algorithm  will  start  with  a  4NF  decom¬ 
position  and  will  preserve  that  decomposition  throughout  the  remainder  of  the 
design.  Thus,  the  primary  point  where  different  NNF  designs  will  originate  is 
embodied  in  the  well  studied  and  understood  creation  of  a  4NF  decomposition. 
We  note,  that  when  the  input  set  of  dependencies  is  conflict  free  there  is  a 
unique  4NF  decomposition,  and,  therefore,  our  approach  also  produces  a  single 
NNF  design  for  this  case.  Let  us  consider  Example  9.2  using  a  preview  of  our 
approach. 

Example  9.3:  (Continuation  of  Example  9.2.)  We  saw  that  it  seemed  best  to 
ensure  that  Date  and  Dance  were  associated  with  the  key  (Boy,  Girl)  and  so 
in  the  4NF  decomposition  we  use  the  key  (Boy,  Girl)  to  make  the  first  split. 
Thus,  we  decompose  into  schemes  (Boy,  Girl,  Date),  (Boy,  Girl,  Dance),  and 
(Boy,  Girl,  BSL,  GSL).  Now  we  can  use  the  other  two  MVDs  in  any  order 
to  decompose  (Boy,  Girl,  BSL,  GSL)  into  (BSL,  Boy),  (GSL,  Girl),  and  (BSL, 
GSL).  Considering  just  the  MVDs,  this  decomposition  is  in  4NF.  If  we  consider 
the  EMVD,  as  we  propose  to  do,  then  the  scheme  (BSL,  GSL)  is  decomposed 
into  BSL  and  GSL,  and  these  two  schemes  are  eliminated  since  they  are  proper 
subsets  of  other  schemes.  Our  method  will  then  proceed  to  create  a  scheme 
tree  for  each  scheme  in  the  4NF  decomposition,  as  shown  in  Figure  9-4. 

Then  we  combine  scheme  trees  when  the  common  attributes  of  two 
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Boy,  Girl  Boy,  Girl  BSL  GSL 

Date  Dance  Boy  Girl 

Figure  9-4.  Initial  scheme  trees  for  each  4NF  scheme  of  Example  9.3. 

trees  form  the  same  root  to  non-leaf  path  in  both  trees.  In  this  example,  we 

combine  the  two  trees  with  root  (Boy,  Girl)  and  our  final  design  is  a  set  of 

three  scheme  trees  as  shown  in  Figure  9-5.  This  design  more  clearly  depicts 

the  intended  relationships  and  came  about  partially  due  to  the  fact  that  we 

carefully  selected  the  4NF  decomposition  that  was  appropriate  for  this  case. 

In  the  approach  of  [OY1],  this  kind  of  decision  making  can  only  go  into  the 

choice  of  fundamental  key  selection,  and  there  is  no  way  to  produce  the  scheme 

trees  of  Figure  9-5,  no  matter  what  choices  are  made.  We  note  that  if  we  did 

not  allow  the  EMVD  to  influence  our  4NF  decomposition,  then  we  would  have 

had  an  additional  edge  relating  BSL  and  GSL  in  either  the  BSL-Boy  tree  or 

the  GSL-Girl  tree.  These  trees  would  still  be  more  intuitive,  and  are  equally 

unattainable  using  [OYl].  □ 

9.1  Definitions  and  Basic  Procedures 

The  first  procedure  we  will  need  for  our  algorithm  is  a  4NF  decomposition 
procedure.  Several  have  been  proposed,  however  we  require  one  that  deals 
with  both  FDs  and  MVDs  and  does  not  treat  FDs  as  MVDs  in  the  design, 


209 


Boy,  Girl  *  BSL  GSL 


Date  Dance  Boy  Girl 

Figure  9-5.  Final  scheme  trees  using  new  approach  to  design  of  Example  9.3. 

thereby  ignoring  the  different  semantics  that  FDs  impose.  Two  approaches  are 
available,  one  by  Beeri  and  Kifer  [BeKl]  and  Katsuno  [Kat],  and  the  other  by 
Yuan  and  Ozsoyoglu  [YO].  In  the  first  approach,  given  a  set  D  of  FDs  and 
MVDs,  a  new  set  M'  of  MVDs  is  formed  by  first  obtaining  the  full  version  of 
the  MVDs  in  D,  and  then  replacing  the  left-hand  side  X  of  each  MVD  in  the 
full  version  by  the  closure  of  X  with  respect  to  D. 

In  the  second  approach,  given  a  set  D  of  FDs  and  MVDs  over  a  set 
U  of  attributes,  a  new  set  E(D)  of  MVDs,  called  an  envelope  set ,  is  created,  so 
that  E(D)  represents  the  structural  dependencies  in  D  relevant  to  the  design 
process. 

Definition  9.1:  The  envelope  set  E{D)  of  a  set  D  of  FDs  and  MVDs  is 

E(D)  =  {X-~W\X  €  LHS{D)  and  W  €  DEPd(X)  and  D&X-+W). 

If  a  database  scheme  is  4NF  with  respect  to  E(D)  then  it  is  also  4NF  (BCNF 
if  D  has  FDs  only)  with  respect  to  D.  Thus,  a  database  scheme  for  D  can  be 
obtained  by  using  E(D)  as  input  to  any  4NF  decomposition  algorithm  [Fag2, 
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D  E  E 

(a)  (b) 

Figure  9-6.  Scheme  tree  (a)  2i,  and  (b)  T2. 

Liel,  GR,  OYl],  without  considering  the  FDs  and  MVDs  separately. 

We  must  consider  which  of  these  two  approaches  to  the  design  of  flat 
databases  will  help  us  most  in  forming  better  -’INF  designs.  As  shown  in 
Example  8.1,  in  an  NNF  design  FDs  cause  Bingleton  sets  to  appear  if  the  MVD 
represented  by  an  edge  in  a  scheme  tree  is  also  an  FD.  In  general,  nesting  is 
not  necessary  when  for  some  edge  (u,v),  u  — »  t>  holds.  Consider  the  scheme 
tree  T\  shown  in  Figure  9-6a.  If  B  — ►  D  holds,  then  each  B  value  will  have  a 
single  D  value  associated  with  it.  Therefore,  there  is  no  need  to  nest  D  values 
allowing  the  tree  Tj,  shown  in  Figure  9-6b,  which  has  a  smaller  structure  and 
is  consistent  with  T\  in  that  MVD(Ti)  =>  MVP(Tj). 

In  the  first  approach  described  above,  a  similar  operation  takes  place 
in  the  closing  of  the  left  hand  sides  of  the  MVDs.  Attributes  in  the  depen- 
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dency  basis  of  a  left  hand  side  X ,  which  are  functionally  determined  by  X  are 
moved  to  the  left  to  form  the  closure  of  X.  Looking  at  7i  and  7j  we  see  that 
MVD(Ti)  =  {A-~BDE\CtAB-+^D,AB-+-+E}.  If  we  make  the  MVDs  full 
and  close  the  left  hand  sides  according  to  the  FD  B  — ►  D,  then  we  get  the 
set  M'  =  {A-*-*BDE\C,ABD-*-*E\C}.  Clearly,  these  MVDs  are  the  MVDs 
found  in  MVD(Tj).  Thus,  the  closure  has  the  effect  of  associating  functionally 
determined  attributes  with  keys  used  to  create  the  ->1NF  hierarchies. 

In  the  second  approach,  the  envelope  set  of  MVDs  is  used  to  represent 
the  FDs  and  MVDs.  Here,  components  of  full  MVDs  are  eliminated  if  those 
components  are  also  FDs.  Here,  there  is  no  attempt  to  associate  functionally 
determined  attributes  with  the  keys  and  so  the  envelope  set  will  not  help  in 
eliminating  singleton  sets  from  our  designs.  For  the  above  example, 

E{MVD{TX)  U  {fi  -  £>})  =  {A^->BDE\C,  AB-*~>D\E\CtB->->ACE}, 

and  this  set  of  MVDs  would  not  help  us  in  achieving  Tj.  Therefore,  we  adopt 
the  first  approach  and  use  the  set  of  MVDs  obtained  by  closing  the  left  hand 
sides  of  the  MVDs  implied  by  the  given  set  of  dependencies  to  obtain  our 
initial  decomposition.  We  use  M'  to  represent  the  set  of  MVDs  produced  by 
this  approach. 

Since  we  desire  to  include  EMVDs  in  our  design  algorithm,  we  must 
perform  some  additional  steps  to  achieve  our  final  decomposition.  Let  U  be  the 
set  of  attributes  to  be  used  in  the  design,  D  a  set  of  given  FDs  and  MVDs,  and 


F  a  set  of  given  EMVDs.  First,  using  the  known  inference  rules,  we  generate 
all  FDs,  MVDs,  and  EMVDs  that  are  implied  by  the  EMVDs  or  the  EMVDs 
together  with  any  known  FDs  or  MVDs.  The  new  FDs  and  MVDs  are  added 
to  the  original  set  D  and  the  new  EMVDs  are  added  to  the  original  set  F. 
We  compute  A/',  the  set  of  MVDs  obtained  by  making  the  MVDs  implied 
by  D  full  and  closing  the  left  hand  sides.  We  also  close  the  left  hand  sides 
of  the  EMVDs  in  F  using  the  FDs  in  D.  We  then  use  M'  as  input  to  one 
of  the  usual  4NF  decomposition  algorithms.  This  results  in  a  set  of  schemes 
Z  =  {i?i,  Ri, . . . ,  Rn}. 

Each  scheme  in  Z  may  have  one  or  more  FDs  implied  by  D  embedded 
within  it.  In  their  flat  database  design,  [BeKl]  use  these  FDs  to  synthesize  a  set 
of  schemes  for  each  scheme  of  Z,  further  eliminating  redundancy  by  achieving  a 
3NF  decomposition.  We  do  not  want  to  take  the  additional  step  of  synthesizing 
3NF  schemes  when  designing  -<1NF  relations  since  organizing  these  schemes 
in  a  scheme  tree  will  only  introduce  singleton  sets.  However,  when  we  allow 
EMVDs  to  influence  our  design,  we  will  have  to  consider  further  decomposition 
based  on  the  FDs  in  each  scheme  of  Z.  The  reason  for  this  is  that  we  can  not 
use  an  EMVD  in  the  design  process  unless  at  some  stage  in  the  decomposition 
it  becomes  a  full  MVD.  For  example,  if  U  =  ABCDE ,  then  we  can  not  use 
the  EMVD  A— *— >B\CD  until  U  is  decomposed  into  a  scheme  which  does  not 
have  E  in  it.  Thus,  we  perform  two  checks  for  the  EMVDs  in  F  following  our 
decomposition  into  Z  with  respect  to  M'.  First,  if  an  EMVD,  F,-,  becomes  a 
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nontrivial  MVD  when  Fj  is  projected  onto  some  scheme  in  R,  say  then  F, 
is  used  to  decompose  i2,-,  and  we  replace  R,  with  its  decomposition  in  JZ.  This 
process  is  repeated  until  no  more  EMVDs  can  be  used  to  decompose  schemes 
in  JZ.  Second,  for  each  scheme  Ri  in  JZ  we  perform  a  temporary  3NF  synthesis 
on  Ri  obtaining  the  schemes  5  =  Si,Si,...,  S'*.  If  an  EMVD,  Fj,  becomes  a 
nontrivial  MVD  when  Fj  is  projected  onto  some  scheme  in  S,  say  Sj,  then  we 
use  Fj  to  decompose  Sj  and  add  the  decomposition  to  JZ.  This  continues  until 
all  schemes  in  S  have  been  considered.  If  any  schemes  remain  in  S  then  we 
take  the  union  of  those  schemes  and  replace  iZ,  with  this  union.  If  all  schemes 
still  remained  in  S  then  we  replaced  iZ,  with  Ri  and  no  change  was  made  to  JZ. 
The  remainder  of  the  ->1NF  design  uses  M',  F,  and  JZ. 

Example  9.4:  [UUJ  Let  U  =  SPYC,  D  =  {SP  -»  Y },  and  F  -  {C++S\P}, 
where  C  is  a  course  taken  by  a  student  S,  and  the  course  has  prerequisite  P 
taken  by  the  student  in  year  Y .  There  are  no  nontrivial  FDs  or  MVDs  implied 
by  the  EMVD,  so  we  find  M 1  =  {SPY  — +C}.  Using  this  set  as  input  to  a  4NF 
decomposition  algorithm,  we  get  the  scheme  SPY  C.  Since  this  is  the  orginal 
set  U,  the  EMVD  is  still  an  EMVD  for  this  scheme.  In  the  next  step,  we  perform 
a  synthesis  on  this  scheme  and  get  the  decomposition  { SPY,SPC }.  Since,  the 
EMVD  in  F  is  an  MVD  for  scheme  SPC,  we  use  it  to  decompose  SPC  into 
CS  and  CP.  The  final  decomposition  is  {CS,CP,SPY},  and  using  our  NNF 
algorithm  we  get  the  scheme  trees  shown  in  Figure  9-7a.  In  comparison,  the 
scheme  tree  produced  by  the  NNF  algorithm  of  [OYl]  is  shown  in  Figure  9- 7b. 


a, 
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S  P  Y  C 


(a)  (b) 

Figure  9-7.  Scheme  trees  for  Example  9.4  using  (a)  our  approach, 

and  (b)  [0Y1]. 

n 

Once  we  have  a  decomposition  Z  with  respect  to  U  and  M'UF,  we 
start  the  process  of  designing  -<1NF  relations  which  are  in  nested  normal  form. 
We  start  by  forming  the  trivial  NNF  design  consisting  of  a  single  scheme  tree 
for  each  4NF  scheme  in  £.  Each  scheme  tree  is  trivial  since  it  will  consist  of 
a  single  path  and,  therefore,  its  edges  will  specify  trivial  EMVDs.  In  a  later 
step,  we  will  combine  scheme  trees  to  to  achieve  a  nontrivial  design. 

In  order  to  maintain  NNF,  even  in  a  trivial  design,  we  can  not  decom¬ 
pose  each  4NF  scheme  arbitrarily.  This  is  due  to  the  requirement  that  only 
fundamental  keys  (see  section  3.4.2)  be  used  as  the  non-leaf  nodes  of  a  scheme 
tree.  Thus,  we  use  a  simplified  version  of  the  DECOMP  procedure  in  [OYl]  to 
perform  the  decomposition.  The  procedure  is  much  simpler  since  the  input  is 
a  set  of  attributes  forming  a  4NF  scheme  and  there  is  no  possibility  of  a  split 
key  appearing  among  these  attributes  (a  condition  checked  for  in  the  original 


215 


procedure),  and  there  is  exactly  one  dependent  in  the  dependency  basis  of  any 
attribute  set  which  is  a  subset  of  a  4NF  scheme.  We  first  provide  a  new  def¬ 
inition  for  “fundamental  keys,”  since  we  also  need  to  deal  with  the  set  F  of 
EMVDs  which  hold  in  the  database.  We  also  improve  the  definition  by  pre¬ 
ferring  fundamental  keys  which  are  not  projections  of  some  essential  key.  For 
example,  if  A  and  BC  are  essential  keys  and  if  we  are  finding  the  fundamental 
keys  of  ABD,  then  we  would  prefer  to  decompose  based  on  the  fundamental 
key  A  rather  than  B,  since  A  represents  a  more  complete  relationship  than  B 
which  is  a  projection  of  BC. 

Definition  0.2:  Given  a  set  M'  of  MVDs,  a  set  F  of  EMVDs,  and  a  set  U  of 
attributes  with  VC  U,  the  set  of  candidate  fundamental  keys  of  V,  denoted 
CFK(V),  is  defined  as  follows: 

CFK(V)  =  {W\W  G  LHS(M')  V 

(W  G  LHS^F1})  A  F'  G  F  A  projv(F')  is  a  nontrivial  MVD  for  V)}. 

Out  of  CFK(V)  we  prefer  those  keys  that  are  minimal  subsets  of  V  and  if 
there  are  none,  we  use  the  minimal  intersections  of  those  keys  with  V.  The 
preferred  fundamental  keys  of  V,  denoted  PFK (V),  and  all  fundamental  keys 
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of  V,  denoted  FK(V),  are  defined  as  follows: 

PFK(V)  =  {X\X  6  CFK{V)  a  X  C  V  A 

such  that  r  6  CFK(V)  AY  C  X} 

FKiy)  =  OV|X  G  CFK(V)  aW  =  XC\VaW^^A 

flY  such  that  Y  6  CFK(V)  AYnV  cW). 

These  modifications  to  the  definition  of  fundamental  keys  allow  for  the  fact  that 
an  EMVD  could  have  been  used  to  form  a  scheme  with  attributes  V.  Procedure 
DECOMP  can  now  be  specified  as  follows: 

Procedure  DECOMP(V,  T) 

{V  is  a  set  of  attributes  which  is  a  node  in  scheme  tree  T } 
begin 

If  V  has  2  or  more  elements  and  FK[V)  ±  0  then 
begin 

(1)  If  PFK(V)  ±  0  then  let  V0  G  PFK[V) 
else  let  V0  G  FK(V); 

(2  )W  =  V-V0] 

(3)  Change  V  into  Vo  in  T; 

(4)  Attach  W  as  a  son  of  y0; 

(5)  DECOMP (Hf,  r); 
end 

end. 

The  final  procedure  we  need  for  our  design  algorithm  is  a  method  for 
combining  scheme  trees  while  maintaining  NNF  and  the  original  4NF  decom¬ 
position.  We  can  combine  scheme  trees  if  the  attributes  that  are  in  common  to 
both  trees  form  the  same  path,  u  to  v,  in  each  tree,  where  u  is  the  root  and  v  is 
a  non-leaf  node  in  both  trees.  If  we  have  two  trees  that  meet  this  requirement 
then  we  can  temporarily  merge  them  into  a  single  tree.  If  the  merged  tree 
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is  free  of  transitive  dependencies,  then  we  let  the  merge  become  permanent. 
After  making  all  possible  merges,  we  have  our  final  NNF  design.  The  MERGE 
procedure  is  as  follows: 

Procedure  MERGE  (71,  T2,  T3) 

{Ti,T2  are  the  two  scheme  trees  whose  common  nodes  form  the  same 
root  to  non-leaf  path  in  both  trees.  T3  is  the  merged  tree.} 

begin 

Ts  :=  Tu 

For  each  edge  (u,  w)  in  T2  do 
if  (v,tu)  is  not  in  Ts  then 

add  (w,  tu)  and  (if  necessary)  nodes  v  and  w  to  T3 

end 

end. 


9.2  The  NNF  Design  Algorithm 

Using  the  procedures  developed  in  the  previous  section,  we  can  now  specify  our 

NNF  design  algorithm  as  follows: 

Algorithm  2 

{  input:  a  set  of  attributes  U,  a  set  of  MVDs  and  FDs  D, 
and  a  set  of  EMVDs  F. 

output:  a  set  of  scheme  trees  Ti,  T2, . . . ,  Tn  in  NNF.  } 

begin 

(1)  Find  a  4NF  decomposition  of  U  with  respect  to  D  U  F 

(a)  Add  to  D  any  FDs  and  MVDs  which  can  be  inferred  from  Du  F 
using  the  EMVDs  in  F. 

(b)  Add  to  F  any  EMVDs  which  can  be  inferred  from  Du  F 
using  the  EMVDs  in  F. 

(c)  Find  a  4NF  decomposition  R  =  {Ri,  Rj, . . . ,  Rk)  with  respect  to 
M\ 

(d)  Decompose  schemes  in  R  according  to  any  EMVDs  in  F 
which  project  as  nontrivial  MVDs  on  some  scheme  in  R. 

Replace  the  decomposed  schemes  in  R  with  their  decomposition. 


(e)  For  i:=  1  to  k  do 
begin 

(i)  Synthesize  a  3NF  decomposition  of  R,,  S  =  (Si,  S2, . . . ,  Sm). 

(ii)  Decompose  schemes  in  S  according  to  any  EMVDs  in  F 
which  project  as  nontrivial  MVDs  on  some  scheme  in  S. 
Remove  any  decomposed  scheme  from  S  and  add 

the  decomposition  to  Z. 

(iii)  Replace  R,-  in  Z  with  the  union  of  the  remaining  schemes 
in  $. 

end 

(2)  Prepare  initial  scheme  trees. 

(a)  Initialize  k  scheme  trees  Ti,  T2, . . . ,  T*  with  no  edges  and 
single  nodes  labeled  Ri,  J?2, . . . ,  R*,  respectively. 

(b)  For  *  :=  1  to  k  do  DECOMP  (R,-,  Ti)  end. 

(c)  Let  T  = 

(3)  Merge  trees. 

Until  no  more  changes  can  be  made  to  T  do 
begin 

(a)  Select  T1  €  T  and  Ta  G  T,  T1  ?  T\ 

such  that  Tl  and  T 2  have  not  been  considered  together. 

(b)  If  the  common  attributes  of  Tl  and  T2  from  the  same  root  to 
non-leaf  path  in  both  trees  then 

begin 

(i)  MERGE(3T1,  T2,  Ts). 

(ii)  If  there  are  no  transitive  dependencies  in  T3  then 

T  :=  T  -{T\T2}U{T3} 

end 

end 

end. 


9.3  Correctness  of  Algorithm  2 


In  this  section  we  show  that  the  -<1NF  design  produced  by  Algorithm  2  is  in 
nested  normal  form.  To  do  this  we  need  to  show  that  the  four  requirements 
of  NNF  hold  for  each  relation  in  the  design.  Each  scheme  tree  T  of  the  design 
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must  satisfy  the  following  four  properties: 

(1)  Inference  property:  DU  F  =*>•  MVD(T),  where  D  is  the  input  set  of 
MVDs  and  F  is  the  input  set  of  EMVDs. 

(2)  PD  property:  There  are  no  partial  dependencies  in  T. 

(3)  TD  property:  There  are  no  transitive  dependencies  in  T. 

(4)  FK  property:  The  root  of  T  is  a  key,  and  for  each  other  node  u  in  T, 
if  FK(D{ u))  ±  0,  then  u  G  FK(D(u)). 

In  the  FK  property,  key  refers  to  LHS(M'). 

We  will  prove  these  four  properties  hold  after  each  major  step  of  Al¬ 
gorithm  2  in  which  scheme  trees  are  created  or  modified. 

Proposition  9.1:  The  four  properties  of  NNF  hold  after  step  2  of  Algorithm 
2  where  the  initial  scheme  trees  are  created. 

Proof: 

(1)  Inference  property:  Since  all  MVDs  and  EMVDs  in  MVD(T)  are 
trivial  for  the  single  path  trees  which  procedure  DECOMP  produces, 
this  property  holds  trivially. 

(2)  PD  property:  Assume  there  is  a  partial  dependency  in  T.  Then  the 
path  set  of  T  can  be  decomposed  using  the  partial  dependency,  and 
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therefore  is  not  in  4NF.  This  contradicts  the  fact  that  we  start  with 
all  path  sets  being  in  4NF  as  a  result  of  step  1  of  Algorithm  D. 

(3)  TD  property:  Trivially  true,  since  the  definition  of  transitive  depen¬ 
dency  requires  sibling  nodes  to  exist  in  the  tree,  and  there  are  none 
in  a  single  path  tree. 

(4)  FK  property:  Procedure  DECOMP  creates  non-leaf  nodes  which  are 

fundamental  keys  of  the  subtrees  with  those  nodes  as  root.  Thus,  this 
property  holds  by  design.  a 

Proposition  9.2:  The  four  properties  of  NNF  hold  after  step  3  of  Algorithm 
2  where  scheme  trees  are  merged. 

Proof:  Assume  there  are  m  trees  Ti,  Tj, . . . ,  Tm  at  some  stage  of  step  3.  We 
show  that  each  property  holds  after  two  trees  7\  and  T2  are  permanently  merged 
into  tree  T'. 

(1)  Inference  property:  Figure  9-8  shows  two  general  trees  Ti  and  T2 
with  common  attributes  u2,u2 ,...,un  forming  the  same  root  to  non¬ 
leaf  path  in  both  trees.  Subtrees  are  summarized  by  the  union  of  all 
nodes  in  the  subtree  (e.g.,  Y* ,  Z{).  Each  of  these  trees  is  assumed  to 
be  in  NNF.  Given  that  these  trees  are  in  NNF  and  the  path  sets  are 
in  4NF,  the  following  JD  holds: 


Figure  9-8.  General  trees  Tj  and  Tj. 


,...,UiZllf  ttiUjZf , . . . ,  U\u2Zl2, . . . ,  ux«a  •  •  •  unZ* «i«2  •  •  •  unZ^ , 
, . . .  ,ui^,  ,  ui«2y^,  •  •  • ,  «i«2  •  >  •  «ny in, . . . ,  mu2  •  •  •  urey,", 

5(rm)). 

This  JD  implies 


•  •  ■  |r,!|z,'|  •  •  •  \z\, 

holds  in  T'  which  is  the  EMVD  representing  edge  (u1(u2).  Similarly, 
the  JD  implies  each  edge  (u<,Uf),  1  <  *  =  l—l  <  n  —  1.  Also, 
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the  EMVDs  represented  by  each  edge  in  the  Y  and  Z  subtrees  are 
still  implied  by  this  JD.  Therefore,  MVD(T *)  holds  and  the  inference 
property  is  maintained. 

(2)  PD  property:  Holds  as  in  Proposition  8.1,  since  we  have  not  modified 
the  4NF  path  sets  by  merging  T\  and  Tj. 

(3)  TD  property:  By  design,  we  specifically  test  that  this  property  is  not 
violated  before  we  merge  trees  permanently. 

(4)  FK  property:  Even  though  some  of  the  non-leaf  u *  nodes  may 

have  a  new  set  of  descendants  consisting  of  the  Y’  and  Z'  sub¬ 
trees  at  that  level,  m  will  still  be  a  fundamental  key  of  V  = 
u.YjYj  •  •  •  Yj.Z[Z\  •  •  •  Z\..  If  u ,•  was  a  minimal  intersection  of  a  subset 
of  V  and  the  keys  of  Af',  then  it  will  be  minimal  for  V.  □ 

By  Propositions  8.1  and  8.2,  we  know  that  relations  designed  using  Algorithm 
2  will  be  in  nested  normal  form  with  respect  to  Af'  and  F.  Since  D  =►  Af',  and 
Af'uF  =>•  MVD(T),  we  have  a  good  representation  of  the  original  dependencies 
in  our  design. 

9.4  Further  Normalization  of  NNF  Relations 

Algorithm  2  produces  a  set  of  -'INF  relations  which  is  in  NNF  with  respect  to 
a  set  of  MVDs  (Af')  and  a  set  of  EMVDs  (F).  Af'  was  derived  from  a  set  D 
of  MVDs  and  FDs  in  step  one  of  the  algorithm.  Later  steps  deal  only  with  Af 1 ' 


and  F,  and  ignore  the  FDs  that  existed  in  D.  This  was  appropriate  since  we 
incorporated  the  FDs  by  closing  the  left  hand  sides  of  the  MVDs  in  D  to  obtain 
M*.  This  eliminates  the  possibility  of  getting  nested  relations  which  will  only 
have  a  single  tuple  in  them.  However,  there  is  still  a  place  where  redundancy 
due  to  FDs  arises  in  our  -ilNF  design.  The  following  example  will  illustrate 
this  problem. 

Example  0.5:  Consider  the  following  university  database  taken  from  [Liel]. 
We  have  attributes  Class,  Day,  Hour,  Tutor,  Office,  Student,  Major,  and  Exam. 
The  dependencies  which  hold  are 

Class  — ►— >•  Day 
Student  — ►  Major 
Tutor  — ►  Office 
Class,  Student  — ►  Exam 
Class,  Tutor  — *— ►  Hour 

Following  Algorithm  2,  we  make  the  MVDs  full  and  close  their  left  hand  sides 
giving  M': 

Class  — *— ►  Day  |  Hour,  Tutor,  Office,  Student,  Major,  Exam 
Class,  Student,  Major  — *— »  Exam  |  Day  |  Hour,  Tutor,  Office 
Class,  Tutor,  Office  — >— ►  Hour  |  Day  |  Exam,  Student,  Major 
The  only  4NF  decomposition  consists  of  the  following  five  schemes: 

Class,  Day 

Class,  Student,  Major,  Exam 


Day  Student,  Major  Tutor,  Office 

/\ 

Exam  Tutor,  Office  Hour 

Figure  9-9.  Scheme  trees  produced  by  Algorithm  2  for  Example  9.5. 

Class,  Tutor,  Office,  Hour 

Class,  Tutor,  Office,  Student,  Major 

One  of  the  two  symmetric  choices  that  Algorithm  2  produces  for  this  set  of 
schemes  is  shown  in  Figure  9-9. 

Although  this  design  is  in  NNF  with  respect  to  M',  there  are  some 
obvious  redundancies  involving  the  nodes  Tutor,  Office  and  Student,  Major. 
Since  the  FD  Student  — ►  Major  holds,  each  time  a  Student  value  is  repeated 
for  different  Class  values,  the  same  Major  value  is  also  repeated.  The  situation 
is  worse  for  Tutor,  Office.  Since  the  FD  Tutor  — ►  Office  holds,  each  time  a 
Tutor  value  is  repeated  for  different  Class  and  Student  values  in  one  relation, 
and  for  different  Class  values  in  the  other  relation,  the  same  Office  value  is  also 
repeated.  □ 

The  problem  with  designs,  such  as  those  in  this  example,  is  that  groups 
of  attributes  which  are  nested  will  introduce  redundancies  if  a  subset  of  that 
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Day  Student  Tutor 

Tutor,  Office 

Exam  Tutor  Hour 

Figure  9*10.  Scheme  trees  produced  by  Algorithm  3  for  Example  9.5. 


group  functionally  determines  some  other  part  of  the  group.  Note  that  this 


problem  does  not  occur  if  the  group  is  at  the  root  of  the  scheme  tree,  since 


these  are  the  atomic  attributes  of  the  relation  and  their  values  will  occur  only 
once  in  the  relation. 


The  solution  is  to  examine  each  scheme  tree  produced  by  Algorithm  2 
for  nodes  N  that  exhibit  the  above  behavior,  replacing  each  N  with  the  smallest 
set  of  attributes  which  functionally  determines  AT,  and  creating  a  new  relation 
with  a  single  node  containing  the  attributes  involved  in  the  redundant  FD  as 
root  and  no  branches.  For  Example  9.5,  the  new  design  would  consist  of  the 
four  scheme  trees  shown  in  Figure  9-10.  Below  we  give  a  new  algorithm  to 
implement  these  changes. 

Algorithm  3 

{  input:  a  set  T  of  scheme  trees  produced  by  Algorithm  2 
and  the  set  of  FDs  G  used  as  input  to  Algorithm  2 
output:  a  new  set  T  of  scheme  trees  in  NNF 

with  redundancies  due  to  FDs  removed.  } 
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begin 

Until  no  more  changes  can  be  made  to  T  do 
begin 

If  there  exists  T  €  T ,  where  T  contains  a  node  N  such  that 
X  — ►  JV  is  implied  by  G,  with  X  C  N  then 
begin 

(a)  Let  Z  C  N  where  G  Z  —*  N,  and 
for  no  W  C  Z  does  G  =»  W  — ►  N. 

(b)  Let  Y  C  Z  where  G=>Y-*N-(Z  —  Y),  and 
for  no  W  C  Y  does  G=>W-+N-{Z-  Y). 

(c)  Modify  T  by  replacing  node  N  with  Z. 

(d)  Add  a  new  tree  to  T  with  the  single  node  N  —  (Z  —  Y) 
and  no  edges. 

end 

end 

end. 


It  is  straightforward  to  show  that  the  scheme  trees  produced  by  Algo¬ 
rithm  3  are  still  in  NNF  with  respect  to  M',  and  if  we  consider  the  FDs  used  to 
change  the  trees  then  the  path  sets  are  still  in  4NF,  and  so  the  join  dependency 
among  the  path  sets  continues  to  hold. 


Chapter  10 
Conclusion 


In  this  chapter  we  summarize  the  results  presented  in  this  dissertation,  and 
provide  direction  for  future  work  in  this  area. 

10.1  Summary  of  Results 

Dropping  the  INF  assumption  in  relational  databases  is  not  a  trivial  step  to 
take.  The  added  complexity  requires  a  thorough  reexamination  of  the  body  of 
relational  database  theory  that  has  already  been  developed  for  INF  databases. 
In  addition,  there  is  ample  opportunity  for  exploring  new  techniques  specific 
to  a  -<1NF  model.  One  advantage  of  our  -’INF  model  is  the  orthogonality 
of  the  change  made  to  the  INF  model.  Instead  of  allowing  just  any  data 
structure  to  model  the  decomposable  values  that  we  now  permit  in  relations, 
we  choose  the  relation  as  that  structure.  This  allows  a  large  bulk  of  the  current 
theoretical  results  for  relational  databases  to  be  applied  in  a  recursive  manner 
to  -<1NF  databases.  As  we  illustrated  in  Chapter  3,  the  extension  to  allow 
nested  relations  is  quite  adequate  for  modeling  a  large  variety  of  database 
problems,  as  well  as  improving  the  design  of  traditional  databases.  As  a  result 
of  our  research,  we  make  several  important  contributions  in  the  -ilNF  relational 
database  area. 

In  Chapter  4,  we  defined  an  extended  relational  calculus  for  use  with 
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-i INF  relations.  This  calculus  forms  a  theoretical  basis  for  the  expressive  power 
of  a  -i INF  query  language.  In  Chapter  6,  we  found  that  the  calculus  is  equiv¬ 
alent  to  the  basic  relational  algebra  extended  with  nest  and  unnest  operators, 
thus  verifying  the  power  of  extended  algebras  proposed  by  other  researchers 
[JS,  FT].  In  Chapter  5,  we  defined  a  normal  form  for  -ilNF  relations,  called 
partitioned  normal  form.  Just  as  traditional  relations  axe  assumed  to  be  in 
INF,  we  believe  that  -ilNF  relations  should  be  assumed  to  be  in  PNF.  PNF  is 
a  basic  normal  form  which  states  that  the  atomic-valued  attributes  of  a  relation 
are  a  key  for  that  relation.  This  assumption  has  several  desirable  consequences. 
First,  we  have  the  intuitive  semantics  that  more  than  one  nested  set  of  values 
should  not  be  associated  with  the  same  set  of  atomic  values.  If  that  seems  to 
be  the  case,  then  there  is  one  or  more  hidden  attributes  that  have  not  been 
properly  included  in  the  database  design.  Second,  PNF  relations  always  have 
the  property  that  nest  is  an  in  /erse  for  unnest,  whereas,  in  general,  that  is  not 
true.  This  means  that  information  is  preserved  when  restructuring  relations. 
Finally,  it  is  straightforward  to  define  a  set  of  extended  relational  algebra  op¬ 
erators  which  are  closed  under  the  set  of  PNF  relations,  and  maintain  the  data 
dependencies  which  underlie  the  structure  of  each  PNF  relation.  In  addition, 
these  operators  are  faithful  and  precise  generalizations  of  the  standard  opera¬ 
tors  with  respect  to  unnesting.  Therefore,  we  can  use  an  extended  operator  to 
achieve  the  same  result  as  unnesting  a  relation,  applying  the  standard  operator, 
and  then  renesting. 
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Since  the  ->1NF  model  allows  us  to  represent  multiple  relationships  in 
a  single  relation  without  the  redundancy  that  exists  in  a  INF  model,  it  is  critical 
that  we  allow  null  values  in  our  model.  This  way,  if  one  relationship  is  unknown 
or  nonexistent,  then  we  can  still  store  another  relationship  that  is  known.  In 
Chapter  7,  we  examined  the  role  of  null  values  in  a  INF  model,  and  extended 
those  results  to  the  -<1NF  model.  One  of  the  critical  issues  here  was  how  to 
handle  the  empty  nested  relation.  We  found  that  the  empty  nested  relation  is 
equivalent  to  a  relation  with  a  single  null  tuple  in  it.  A  null  tuple  consists  of  no¬ 
information  null  values  for  the  atomic-valued  attributes  and,  recursively,  nested 
relations  with  single  null  tuples  for  the  set-valued  attributes.  This  means  that 
unnest  is  still  defined  properly  even  when  unnesting  an  empty  nested  relation. 
Unlike  other  approaches  which  eliminate  tuples  when  an  empty  nested  relation 
is  unnested,  our  approach  preserves  information  in  the  relation.  We  gave  also 
new  definitions  for  the  extended  algebra  operators  so  that  they  deal  with  null 
values.  We  found  that  these  null-extended  operators  are  faithful  and,  in  the 
case  of  union  and  projection,  precise  generalizations  of  the  standard  operators 
with  respect  to  unnesting  and  an  open  world  possibility  function.  We  found  a 
null-extended  natural  join  which  is  an  adequate  and  restricted  generalization 
of  standard  natural  join  and  a  null-extended  difference  which  is  an  adequate 
and  restricted  generalization  of  standard  difference.  Finally,  we  found  that  ar¬ 
guments  which  have  led  to  new  axiomatizations  for  functional  and  multivalued 
dependencies  in  the  presence  of  null  values  are  based  on  incorrect  assumptions 
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about  the  nature  of  null  values.  We  showed  that  the  usual  axiomatizations  are 
valid,  and  should  be  used  for  dependency  inference  even  when  null  values  are 
present. 


In  Chapter  8,  we  defined  a  ->1NF  user  language,  called  SQL/NF,  which 
is  based  on  the  commercial  database  language  SQL.  SQL  is  a  powerful  query 
language,  responsible  for  a  lot  of  the  current  acceptance  of  relational  databases. 
SQL-like  languages  are  easy  to  learn  and  provide  improved  data  independence 
over  former  database  query  languages.  We  have  extended  SQL  to  enhance 
its  ease  of  use  and  expanded  its  expressiveness  to  deal  with  ->1NF  relations. 
Since  our  extensions  keep  the  relational  model  “pure”  in  that  all  data  is  rep¬ 
resented  as  relations,  or,  recursively,  as  relations  within  relations,  there  are  no 
longer  two  types  of  structures — relations  and  partitioned  relations  (created  by 
“GROUP  BY”.)  This  consistency  makes  application  of  functions  and  use  of 
SFW-expressions  straightforward  and  logical.  In  summary,  the  major  advan¬ 
tages  of  our  language  are 

(1)  Orthogonality  of  expressions.  Wherever  a  relation  could  logically  oc¬ 
cur,  a  SFW-expression  is  allowed. 

(2)  Orthogonality  of  functions.  Functions  are  applied  to  relations,  and 
not  to  attributes  which  stood  for  relations. 

(3)  Arbitrary  restructuring  of  relations  via  the  nest  and  unnest  operators. 
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(4)  Elimination  of  “GROUP  BY"  and  “HAVING”  clauses. 

(5)  Use  of  references  names  to  simplify  queries  and  to  rename  attributes. 

(6)  More  complete  and  logical  treatment  of  null  values,  including  a 
method  for  performing  outer  joins  and  elimination  of  subsumed  tuples. 

(7)  Upward  compatibility  from  a  strict  INF  system,  in  which  SFW- 
expressions  in  the  SELECT  clause  must  evaluate  to  single  values,  and 
relation-values  are  not  allowed  in  the  database. 

Finally,  we  looked  at  the  design  of  ->1NF  relations  using  the  criteria 
of  nested  normal  form.  NNF  eliminates  anomalies  due  to  partial  and  transitive 
redundancies  in  PNF  relations.  In  Chapter  9,  we  presented  a  new  algorithm 
for  achieving  an  NNF  design.  Our  approach  has  the  advantage  of  using  a 
4NF  decomposition  as  input  to  the  algorithm.  This  gives  the  user  more  con¬ 
trol  over  the  final  design  of  the  ->1NF  relations,  by  allowing  him  to  choose 
the  4NF  decomposition  which  emphasizes  the  data  associations  he  considers 
most  critical.  We  made  also  several  improvements  to  the  design  process  by 
considering  embedded  multivalued  dependencies  in  coming  up  with  the  4NF 
decomposition,  and  by  better  utilizing  functional  dependencies  in  the  design  of 
the  -<1NF  relations.  Functional  dependencies  are  especially  important  in  that 
a  naive  approach  to  their  use  will  create  many  single  tuple  nested  relations  and 
unnecessary  redundancy  in  the  design. 
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10.2  Directions  for  Future  Work 

We  feel  there  are  three  primary  areas  where  future  work  is  necessary  for  ->1NF 
relations.  These  areas  are  extensions  to  the  model  to  include  recursive  schemes, 
the  recursive  algebra  and  optimization  of  the  algebra  and  SQL/NF,  and  imple¬ 
mentation.  We  briefly  describe  the  problems  in  these  three  areas  below. 

10.2.1  Recursive  Schemes 

In  the  -i INF  model  defined  in  Chapter  3,  we  restrict  relation  schemes  to  be 
nonrecursive.  A  possible  direction  for  future  work  is  the  elimination  of  this 
restriction  and  the  study  of  the  ensuing  consequences.  There  are  many  situ¬ 
ations  for  which  recursive  schemes  would  be  an  appropriate  model.  Consider 
a  management  hierarchy.  This  can  be  represented  by  the  recursive  scheme: 
Employee  =  (name,  Employee).  In  this  relation  scheme  each  employee  has  a 
name  and  a  set  of  employees  who  work  for  him.  The  recursion  stops  when,  at 
the  bottom  of  the  hierarchy,  an  employee  has  no  one  working  for  him  and  so  his 
Employee  set  is  empty.  Problems  involve  querying  such  a  relation,  referencing 
the  same  attribute  at  many  different  levels,  and  redundancy  of  the  hierarchy  if 
an  employee  works  for  more  than  one  person.  Special  operators  will  be  needed 
to  search  the  hierarchy,  to  merge  sets  at  different  levels,  and  to  do  transitive 
closures  [Zlo].  In  [Jacl,  Jac2],  recursive  schemes  are  allowed  in  the  more  pow¬ 
erful  database  logic,  but  operators  on  these  schemes  are  not  easy  to  formulate 


or  understand  by  database  users,  and  the  problems  mentioned  are  not  entirely 
solved. 

10.2.2  Recursive  Algebra  and  Optimization 

In  Chapter  4,  we  defined  an  extended  relational  algebra  which  made  no  changes 
to  existing  operators  and  added  nest  and  unnest.  Although  this  algebra  is 
powerful  enough  to  operate  in  a  -<1NF  environment,  it  lacks  the  convenience 
that  is  provided  by  a  recursive  algebra  [Jae3,  ScSl].  Using  the  extended  algebra 
a  relation  must  be  unnested  in  order  to  perform  operations  on  the  tuples  or 
attributes  of  the  nested  relations.  In  a  recursive  algebra,  operations  on  nested 
relations  may  be  nested  within  the  projection  operator.  Although,  the  recursive 
algebra  has  the  same  expressive  power  as  the  nonrecursive  algebra  [Jae2,  Jae3], 
the  convenience  of  the  recursive  algebra  outweighs  the  simple  semantics  of  the 
nonrecursive  algebra.  In  addition,  the  recursive  algebra  more  closely  matches 
the  structure  of  SQL/NF,  and  so  it  is  a  more  appropriate  vehicle  for  optimizing 
SQL/NF  queries.  Research  is  needed  to  investigate  optimizing  recursive  algebra 
queries  and  translating  SQL/NF  queries  into  the  recursive  algebra  so  they  can 
be  optimized  and  executed.  Some  optimization  work  involving  operators  similar 
to  nest  and  unnest  for  statistical  databases  was  done  in  [OMO]. 

10. 2.  S  Implementation 

Some  implementation  work  for  ->1NF  relations  has  been  done  in  the  Federal 


Republic  of  Germany  [BR,  DGW,  D+,  GP,  Sch2],  France  [AB2,  B+],  and  the 
United  States  [Bat].  This  field  is  still  wide  open  for  the  development  of  new 
techniques  for  storing  and  accessing  data  in  ->1NF  relations.  Storage  structures, 
access  techniques,  indexing  of  nested  relations,  concurrency  control,  and  human 
interfaces  are  all  areas  where  more  work  is  required  to  make  ->1NF  a  viable 
alternative  to  existing  models.  Furthermore,  the  view  mechanism  for  INF 
databases  must  be  expanded  for  -<1NF  databases,  and  the  update  problem  for 
views  must  be  reexamined.  We  believe  that  lower  redundancy  of  data  and  the 
reduced  use  of  join  operations  will  more  than  make  up  for  the  added  complexity 
in  storing  and  accessing  -ilNF  relations. 
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In  this  appendix  we  prove  that  a  powerset  operation  is  not  achievable  using 
the  operators  of  the  extended  relational  algebra  presented  in  Chapter  4.  A 
powerset  operation  finds  all  subsets  of  a  given  set.  If  a  set  has  n  elements  then 
the  powerset  has  2n  elements.  We  assume  that  the  given  set  is  a  relation  r 
with  each  tuple  of  r  containing  one  of  the  elements  of  the  set.  Thus,  r  has 
n  tuples,  or  |r|  =  n.  Our  goal  for  a  powerset  operator  would  be  to  create  a 
relation  with  2n  tuples,  each  tuple  containing  a  set  of  values  from  the  original 
relation.  Our  strategy  is  to  show  that  there  is  no  extended  algebra  expression 
E  which  can  create  an  exponential  number  of  tuples  where  E  operates  only  on 
r  and  constant  relations.  We  do  this  in  two  steps.  First,  we  show  that  if  E 
has  k  operators  then  the  number  of  tuples  in  the  relation  formed  by  ft*  (E)  is 
0(n*+1).  Then,  since  E  must  have  equal  to  or  fewer  tuples  than  n*(E),  |JE7|  will 
also  be  0{nfc+1).  Since  we  must  fix  k  in  any  expression,  we  can  always  find  a 
relation  r  such  that  2^  >  \E\,  and  so  E  can  not  be  a  powerset  operation. 

Lemma  A-l.  Given  a  relation  r  and  an  extended  algebra  expression  E,  |r|  =  n, 
E  has  k  operators;  the  number  of  tuples  in  the  relation  formed  by  n*{E)  is 
0(nfc+1). 

Proof:  The  proof  is  by  induction  on  the  number  of  operators  k  in  expression 
E. 
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Base  ease :  k  =  0:  trivial,  \p*{E)\  =  0(n)  as  E  can  only  be  r  or  a  constant 
relation. 

Induction  step :  There  are  two  cases,  one  for  unary  operators  and  one  for  binary 
operators. 

Case  1:  Unary  operator  (otir,i/,p).  Assume  we  have  an  expression  E'  with 
j  operators  where  | p*{E')\  =  0(n,+1).  Consider  the  expression  E  = 
0{E')y  where  0  is  one  of  the  unary  operators.  The  operators  o,  n,  and 
v  can  not  increase  the  number  of  tuples  in  E'  and  so  p*  (E)  will  have 
no  more  tuples  than  p*  (E1) .  Therefore,  \p*  (E)  |  is  also  order  n,+1  and 
so  is  certainly  order  »(,+1)+1.  If  6  is  p,  then,  even  though  |2?|  >  \E'\, 
we  will  have  \p'(E)\  —  |/i*(£',)|.  This  is  because  the  explicit  unnest 
operation  performed  in  E  is  also  present  in  the  p*  operation  on  E'. 
Thus,  as  for  the  other  unary  operators,  |/x*(-E)|  =  0(n^+1l+1). 

Case  2:  Binary  operator  (x,U,— ).  Assume  we  have  two  expressions  E1  with 
l  operators  and  En  with  m  operators,  where  t  +  m  =■  j,  !/**(£')  I  = 
0(n*+1),  and  |#**(i£")|  =  0(nm+1).  If  E  =  E*  —  E"  then  there  are  no 
more  tuples  in  E  than  there  are  in  E'  and  so  |/z*  (S)|  =  0(n.m),  which 
implies  that  \p*  (I£)|  =  0(n(,+1l+1).  Cartesian  product  and  union  can 
increase  the  size  of  the  new  relation.  For  cartesian  product  the  new 
size  will  be  the  product  of  the  sizes  of  the  two  operands,  and  for  union 
the  new  size  will  be  at  most  the  sum  of  the  sizes  of  the  two  operands. 
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First,  let  E  =  E1  x  E".  Since,  p*{E'  x  E")  =  n*{E')  x  p'{E")  [FT], 
we  have 

ISiE1  x  £")|  =  8(n*+1)0(nm+1) 

=  0(»m+m+1) 

=  0(n^+1)+1) 

For  union  the  result  is  similar,  except  that  we  sum  the  cardinalities 
of  the  operands,  which  is  certainly  of  order  of  the  product. 

In  each  case  we  find  that  given  an  expression  (s)  with  j  operators  with  cardinal¬ 
ity  of  order  n,+l,  if  we  add  one  more  operator  then  the  cardinality  is  of  order 
n( /+!)+!.  This  proves  the  induction  and  the  lemma  is  proved.  □ 

Theorem  A-l.  Given  a  relation  r,  there  is  no  expression  in  the  extended 
relational  algebra  which  can  compute  a  powerset  operation  on  r. 

Proof:  Suppose  there  was  an  expression  P  that  could  perform  a  powerset. 
Then  |P|  =  2n,  where  |r|  =  n.  By  Lemma  A-l,  we  know  that  P  must  have 
cardinality  which  is  of  order  that  is  polynomial  in  n,  specifically  0(nt+1),  where 
k  is  the  number  of  operators  in  P.  Since,  we  can  choose  n  such  that  2n  > 
0(n*+1),  it  is  not  possible  that  P  computes  the  powerset  of  r.  □ 
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The  following  is  a  modified  BNF  definition  of  the  queries  facilities,  DML, 
and  DDL  in  SQL/NF.  We  used  RDL  [X3H2]  as  a  baseline  definition.  Non- 
distinguished  symbols  are  enclosed  with  “()”.  The  structure  [  ]  indicates  an 
optional  entry,  and  the  structure  “...”  indicates  an  additional  zero  or  more 
repetitions  of  the  previous  entry.  Braces  are  used  for  grouping  in  the  BNF. 
Except  where  modified  by  braces,  sequencing  has  precedence  over  disjunction 
(indicated  by  “|”). 

Query  Facilities 

(query  expression)  (query  spec)  |  (structured  query) 

|  function  ((query  expression)) 

|  (nested  query  expression) 

|  (query  expression)  (set  operator)  (query  expression) 

(structured  query)  NEST  (nested  query  expression)  ON  (column  list) 

[AS  (column  name)] 

|  UNNEST  (nested  query  expression)  ON  (column  list) 

|  ORDER  (nested  query  expression)  BY  (sort  spec)... 

(sort  spec)  {(unsigned  integer)  |  (column  name)}  [ASC  |  DESC] 

(query  spec)  (select  from  spec) 

[WHERE  (search  condition)  [PRESERVE  (table  list)]] 

(select  from  spec)  SELECT  (select  list)  FROM  (table  list)  |  (table  name) 

(select  list)  ALL  |  (column  list)  |  (select  spec)  [{,  (select  spec)}...] 

(select  spec)  (column  expression)  |  (reference  name)  .ALL 

(column  expression)  (value  expression)  [AS  (column  name)] 

(table  list)  (table  spec)... 

(table  spec)  (nested  query  expression)  [AS  (column  name)] 

(search  condition)  (boolean  term)  j  (search  condition)  OR  (boolean  term) 
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{boolean  term)  {boolean  factor)  |  {boolean  term)  AND  {boolean  factor) 

{boolean  factor)  [NOT]  {boolean  primary) 

{boolean  primary)  {predicate)  |  ({search  condition)) 

{predicate)  {comparison  predicate)  |  {between  predicate) 

|  {in  predicate)  |  {like  predicate) 

|  {exists  predicate)  j  {null  predicate) 

{comparison  predicate)  {value  expression)  {comp  op)  {value  expression) 

{comp  op)  ::-  =  |  <  |  >  |  <=  |  >=  |  <>  |  [NOT]  ELEMENT  OF 
|  [NOT]  CONTAINS  |  [NOT]  SUBSET  OF 

{between  predicate)  {value  expression)  [NOT]  BETWEEN 

{value  expression)  AND  {value  expression) 

{in  predicate)  {value  expression  tuple  list)  IN  {nested  query  expression) 

{value  expression  tuple  list)  {value  expression) 

[  <{value  expression)  [{,  {value  expression)}...]  > 

{like  predicate)  {not  further  defined) 

{exists  predicate)  EXISTS  {nested  query  expression) 

{null  predicate)  {column  spec)  IS  [NOT]  NULL 

{nested  query  expression)  {table  name)  |  ({query  expression)) 

{column  list)  [ALL  BUT]  {column  spec)  [{,  {column  spec)}...] 

{function)  MAX  |  MIN  |  AVG  |  SUM  j  COUNT  |  DISTINCT  |  SUBSUME 

{set  operator)  UNION  |  DIFFERENCE  j  INTERSECT 

{data  type)  {character  string  type)  |  {numeric  type) 

{value  expression)  {term)  |  {value  expression)  {+|— }  {term) 

{term)  {factor)  |  {term)  {*]/}  {factor) 

{factor)  [+|~]  {primary) 


(primary)  (nested  query  expression)  j  (value  spec)  |  (column  spec) 

|  ((value  expression)) 

(value  list)  (value  spec)... 

(value  spec)  (literal)  |  NULL 

(literal)  (character  string  literal)  |  (numeric  literal)  |  (tuple  literal) 

|  (don’t  care  literal) 

(tuple  literal)  <(value  spec)  [{,  (value  spec)}...]  > 

(column  spec)  [{(reference  name) .}...] (column  name) 

DML 

(dml  statement)  (store  statement)  |  (modify  statement)  |  (erase  statement) 

(store  statement)  STORE  (table  name)  [((column  list))]  {VALUES  (value  list) 

|  (query  expression) 

(modify  statement)  MODIFY  (table  name)  [AS  (reference  name)] 

SET  (set  clause)...  [WHERE  (search  condition)] 

(set  clause)  (column  name)  =  {(value  expression)  |  ((dml  statement))} 

(erase  statement)  ERASE  (table  name)  [AS  (reference  name)] 

[WHERE  (search  condition)] 


DDL 

(ddl  statement)  (schema)  |  (scheme) 

(schema)  SCHEMA  {(table  definition)  |  (view  definition)}... 

(table  definition)  TABLE  (table  name)  {(table  element)...  |  (scheme  name)} 

(table  element)  (column  specification) 

|  CONSTRAINTS  (table  constraint  definition)... 

(column  specification)  ITEM  {(column  definition)  |  ((table  definition))} 


(column  definition)  (column  name)  (data  type) 

[(column  constraint  spec)...]  [(default  clause)! 

(column  constraint  spec)  (not  null  clause)  |  (unique  clause) 

|  (references  clause)  |  (check  clause) 

(not  null  clause)  NOT  NULL 

(unique  clause)  UNIQUE 

(references  clause)  REFERENCES  (column  spec)  [(update  rule)] 

[(delete  rule)] 

(check  clause)  CHECK  (search  condition) 

(default  clause)  DEFAULT  (literal) 

(table  constraint  definition)  (unique  constraint  defnition) 

|  (referential  constraint  definition) 
j  (check  constraint  definition) 

(unique  constraint  defnition)  UNIQUE  (column  list) 

(referential  constraint  definition)  REFERENCES  (column  list) 

WITH  (column  list) 

[(update  rule)]  [(delete  rule)] 

(update  rule)  (action)  MODIFY 
(delete  rule)  (action)  ERASE 
(action)  CASCADE  j  NULLIFY  |  RESTRICT 

(check  constraint  definition)  CHECK  (search  condition)  [(defer  clause)] 

(defer  clause)  IMMEDIATE  |  DEFERRED 

(view  definition)  VIEW  (table  name)  AS  (query  expression) 

(scheme)  SCHEME  (scheme  definition)... 

(scheme  definition)  TABLE  (scheme  name)  (table  element)... 
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